Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ Finally, there is current work in progress to support a combination of [multiple
* [Remote HPC Configuration](https://pysqa.readthedocs.io/en/latest/advanced.html#remote-hpc-configuration)
* [Access to Multiple HPCs](https://pysqa.readthedocs.io/en/latest/advanced.html#access-to-multiple-hpcs)
* [Debugging](https://pysqa.readthedocs.io/en/latest/debug.html)
* [Local Queuing System](https://pysqa.readthedocs.io/en/latest/debug.html#local-queuing-system)
* [Remote HPC](https://pysqa.readthedocs.io/en/latest/debug.html#remote-hpc)

# License
`pysqa` is released under the BSD license https://github.com/pyiron/pysqa/blob/main/LICENSE . It is a spin-off of the `pyiron` project https://github.com/pyiron/pyiron therefore if you use `pysqa` for calculation which result in a scientific publication, please cite:
Expand Down
13 changes: 12 additions & 1 deletion docs/source/debug.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# Debugging
The configuration of a queuing system adapter, in particular in a remote configuration with a local installation of `pysqa` communicating to a remote installation on your HPC can be tricky. To simplify the process `pysqa` provides a series of utility functions:
The configuration of a queuing system adapter, in particular in a remote configuration with a local installation of `pysqa` communicating to a remote installation on your HPC can be tricky.

## Local Queuing System
To simplify the process `pysqa` provides a series of steps for debugging:

* When `pysqa` submits a calculation to a queuing system it creates an `run_queue.sh` script. You can submit this script using your batch command e.g. `sbatch` for `SLURM` and take a look at the error message.
* The error message the queuing system returns when submitting the job is also stored in the `pysqa.err` file.
* Finally, if the `run_queue.sh` script does not match the variables you provided, then you can test your template using `jinja2`: `Template(open("~/.queues/queue.sh", "r").read()).render(**kwargs)` here `"~/.queues/queue.sh"` is the path to the queuing system submit script you want to use and `**kwargs` are the arguments you provide to the `submit_job()` function.

## Remote HPC
The failure to submit to a remote HPC cluster can be related with to an issue with the local `pysqa` configuration or an issue with the remote `pysqa` configuration. To identify which part is causing the issue, it is recommended to first test the remote `pysqa` installation on the remote HPC cluster:

* Login to the remote HPC cluster and import `pysqa` on a python shell.
* Validate the queue configuration by importing the queue adapter using `from pysqa import QueueAdapter` then initialize the object from the configuration dictionary `qa = QueueAdapter(directory="~/.queues")`. The current configuration can be printed using `qa.config`.
* Try to submit a calculation to print the hostname from the python shell on the remote HPC cluster using the `qa.submit_job(command="hostname")`.
Expand Down