Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ src/Makevars
windows
/doc/
/Meta/
.idea/
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: clustermq
Title: Evaluate Function Calls on HPC Schedulers (LSF, SGE, SLURM, PBS/Torque)
Title: Evaluate Function Calls on HPC Schedulers (LSF, SGE, GCS, OCS, SLURM, PBS/Torque)
Version: 0.9.9
Authors@R: c(
person('Michael', 'Schubert', email='mschu.dev@gmail.com',
Expand Down
2 changes: 1 addition & 1 deletion R/clustermq-package.r
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' Evaluate Function Calls on HPC Schedulers (LSF, SGE, SLURM)
#' Evaluate Function Calls on HPC Schedulers (LSF, SGE, GCS, OCS, SLURM)
#'
#' Provides the \code{Q} function to send arbitrary function calls to
#' workers on HPC schedulers without relying on network-mounted storage.
Expand Down
48 changes: 48 additions & 0 deletions R/qsys_sge.r
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,54 @@ SGE = R6::R6Class("SGE",
cloneable = FALSE
)

#' Class for Open Cluster Scheduler (OCS)
OCS = R6::R6Class("OCS",
inherit = QSys,

public = list(
Comment on lines +61 to +64
Copy link
Owner

@mschubert mschubert Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the SGE initializer not reused here? Job names are guaranteed to be unique within clustermq; but if IDs are better, we should use them in SGE as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no access to SGE and do not know if job names were unique back then with the old Sun Microsystems release.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel fairly strongly that we should duplicate this code only if necessary. The ideal implementation would be shared between SGE, OCS and GCS

initialize = function(addr, n_jobs, master, ..., template=getOption("clustermq.template", class(self)[1]),
log_worker=FALSE, log_file=NULL, verbose=TRUE) {
super$initialize(addr=addr, master=master, template=template)

opts = private$fill_options(n_jobs=n_jobs, ...)
filled = fill_template(private$template, opts, required=c("master", "n_jobs"))
qsub_stdout = system2("qsub", input=filled, stdout=TRUE)

status = attr(qsub_stdout, "status")
if (!is.null(status) && status != 0)
private$template_error(class(self)[1], status, filled)

private$job_id = regmatches(qsub_stdout, regexpr("^[0-9]+", qsub_stdout))
if (length(private$job_id) == 0)
private$template_error(class(self)[1], qsub_stdout, filled)

if (verbose)
message("Submitted ", n_jobs, " worker tasks to ", class(self)[1], " as array job ", private$job_id, " ...")

private$master$add_pending_workers(n_jobs)
},

cleanup = function(success, timeout) {
system(paste("qdel", private$job_id), ignore.stdout=TRUE, ignore.stderr=TRUE, wait=FALSE)
}
),

private = list(
job_id = NULL
),

cloneable = FALSE
)

#' Class for Gridware Cluster Scheduler (GCS)
GCS = R6::R6Class("GCS",
inherit = OCS,
cloneable = FALSE

# no changes needed, but we want to have a separate class for GCS to allow for GCS-specific
# templates and enterprise edition options
)

PBS = R6::R6Class("PBS",
inherit = SGE,

Expand Down
5 changes: 2 additions & 3 deletions R/zzz.r
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
if (length(qsys_default) == 0) {
qname = c("SLURM", "LSF", "SGE", "LOCAL")
exec = Sys.which(c("sbatch", "bsub", "qsub"))
select = c(which(nchar(exec) > 0), 4)[1]
select = c(which(nchar(exec) > 0), 6)[1]
qsys_default = qname[select]
}

Expand All @@ -26,8 +26,7 @@
#' @keywords internal
.onAttach = function(libname, pkgname) {
if (is.null(getOption("clustermq.scheduler"))) {
packageStartupMessage("* Option 'clustermq.scheduler' not set, ",
"defaulting to ", sQuote(qsys_default))
packageStartupMessage("* Option 'clustermq.scheduler' not set, ", "defaulting to ", sQuote(qsys_default))
packageStartupMessage("--- see: https://mschubert.github.io/clustermq/articles/userguide.html#configuration")
}
if (!libzmq_has_draft()) {
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ schedulers](https://mschubert.github.io/clustermq/articles/userguide.html#config
* [SLURM](https://mschubert.github.io/clustermq/articles/userguide.html#slurm) - *should work without setup*
* [LSF](https://mschubert.github.io/clustermq/articles/userguide.html#lsf) - *should work without setup*
* [SGE](https://mschubert.github.io/clustermq/articles/userguide.html#sge) - *may require configuration*
* [GCS](https://mschubert.github.io/clustermq/articles/userguide.html#gcs) - *needs* `options(clustermq.scheduler="GCS")`
* [OCS](https://mschubert.github.io/clustermq/articles/userguide.html#ocs) - *needs* `options(clustermq.scheduler="OCS")`
* [PBS](https://mschubert.github.io/clustermq/articles/userguide.html#pbs)/[Torque](https://mschubert.github.io/clustermq/articles/userguide.html#torque) - *needs* `options(clustermq.scheduler="PBS"/"Torque")`
* via [SSH](https://mschubert.github.io/clustermq/articles/userguide.html#ssh-connector) -
*needs* `options(clustermq.scheduler="ssh", clustermq.ssh.host=<yourhost>)`
Expand Down
49 changes: 49 additions & 0 deletions inst/GCS.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Submit client should only show the job ID of the new job on success
#$ -terse

# Name of the job visible in OCS
#$ -N {{ job_name }}

# Join error and output file.
#$ -j y

# Location of the output file
#$ -o {{ log_file | /dev/null }}

# Start R job in the working directory
#$ -cwd

# Export the full environment to the R job (e.g if *LD_LIBRARY_PATH* is required).
# Depending on security settings might require a cluster manager to set
# ENABLE_SUBMIT_LIB_PATH=1 as *qmaster_param*
#$ -V

# Spawns workload as tasks of an array job into the scheduler (one job with multiple tasks)
#$ -t 1-{{ n_jobs }}

# Each array task will allocate one slot in the cluster, if not other specified.
#$ -pe mytestpe {{ cores | 1 }}

# Per slot the job will get one power core (C) assuming R code is single-threaded, if not other specified.
#$ -bunit C
#$ -bamount {{ threads | 1 }}

# Cores on a host are packed (cores on a die or chiplet sharing same NUMA node and caches if possible)
#$ -bstrategy packed
#$ -btype host

# The scheduler will do the binding via *HWLOC*.
# Change to *env* if scheduler should make binding decision but not do the binding itself.
#$ -binstance set

# Allows to set resource requests like memory (1 GB [in bytes])
# to set runtime limits (1 hour [in seconds])
# or to influence scheduler resource selection (job will be executed in all.q queue)
#$ -l mem_free={{ memory | 1073741824 }},h_rt={{ walltime | 3600 }},q=all.q

# Tag the job so that it can be identified later on (e.g. in a JSV script before
# submission so the job can get adapted or for filtering later on)
#$ -ac application=clustermq

ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
37 changes: 37 additions & 0 deletions inst/OCS.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Submit client should only show the job ID of the new job on success
#$ -terse

# Name of the job visible in OCS
#$ -N {{ job_name }}

# Join error and output file
#$ -j y

# Location of the output file
#$ -o {{ log_file | /dev/null }}

# Start R job in the working directory
#$ -cwd

# Export the full environment to the R job (e.g if *LD_LIBRARY_PATH* is required)
# depending on security settings might require a cluster manager to set
# ENABLE_SUBMIT_LIB_PATH=1 as *qmaster_param*
#$ -V

# Spawns workload as tasks of an array job into the scheduler (one job with multiple tasks)
#$ -t 1-{{ n_jobs }}

# Each array task will allocate one slot in the cluster, if not other specified.
#$ -pe mytestpe {{ cores | 1 }}

# Per slot the job will get one power core (C) assuming R code is single-threaded, if not other specified.
#$ -bunit C
#$ -bamount {{ threads | 1 }}

# Allows to set resource requests like memory (default 1 GB in bytes)
# to set runtime limits (default 1 hour in seconds)
# or to influence scheduler resource selection (job will be executed in all.q)
#$ -l mem_free={{ memory | 1073741824 }},h_rt={{ runtime | 3600 }},q=all.q

ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
4 changes: 2 additions & 2 deletions vignettes/faq.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Running 1 calculations (5 objs/19.4 Kb common; 1 calls/chunk) ...

You will see this every time your jobs are queued but not yet started.
Depending on how busy your HPC is, this may take a long time. You can check the
queueing status of your jobs in the terminal with _e.g._ `qstat` (SGE), `bjobs`
queueing status of your jobs in the terminal with _e.g._ `qstat` (SGE, GCS or OCS), `bjobs`
(LSF), or `sinfo` (SLURM).

If your jobs are already finished, this likely means that the `clustermq`
Expand Down Expand Up @@ -201,7 +201,7 @@ Alternatively, you can create a script that uses SSH to execute the scheduler
on the login node. For this, you will need an SSH client in the container,
[keys set up for password-less login](https://www.digitalocean.com/community/tutorials/how-to-configure-ssh-key-based-authentication-on-a-linux-server),
and create a script to call the scheduler on the login node via ssh (e.g.
`~/bin/qsub` for SGE/PBS/Torque, `bsub` for LSF and `sbatch` for Slurm):
`~/bin/qsub` for SGE/GCS/OCS/PBS/Torque, `bsub` for LSF and `sbatch` for Slurm):

```{sh eval=FALSE}
#!/bin/bash
Expand Down
2 changes: 2 additions & 0 deletions vignettes/technicaldocs.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ functions for submitting and cleaning up jobs:
|- Multicore
|- LSF
+ SGE
|- GCS
|- OCS
|- PBS
|- Torque
|- etc.
Expand Down
113 changes: 112 additions & 1 deletion vignettes/userguide.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ To set up a scheduler explicitly, see the following links:
* [SLURM](#slurm) - *should work without setup*
* [LSF](#lsf) - *should work without setup*
* [SGE](#sge) - *may require configuration*
* [GCS](#gcs) - *needs* `options(clustermq.scheduler="GCS")`
* [OCS](#ocs) - *needs* `options(clustermq.scheduler="OCS")`
* [PBS](#pbs)/[Torque](#torque) - *needs* `options(clustermq.scheduler="PBS"/"Torque")`
* you can suggest another scheduler by [opening an
issue](https://github.com/mschubert/clustermq/issues)
Expand Down Expand Up @@ -284,7 +286,7 @@ time after you restart R.

* `clustermq.scheduler` - One of the supported
[`clustermq` schedulers](#configuration); options are `"LOCAL"`,
`"multiprocess"`, `"multicore"`, `"lsf"`, `"sge"`, `"slurm"`, `"pbs"`,
`"multiprocess"`, `"multicore"`, `"lsf"`, `"sge"`, `"gcs"`, `"ocs"`, `"slurm"`, `"pbs"`,
`"Torque"`, or `"ssh"` (default is the HPC scheduler found in `$PATH`,
otherwise `"LOCAL"`)
* `clustermq.host` - The name of the node or device for constructing the
Expand Down Expand Up @@ -481,6 +483,115 @@ In this file, `#BSUB-*` defines command-line arguments to the `bsub` program.
Once this is done, the package will use your settings and no longer warn you of
the missing options.


### GCS
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The templates and docs of GCS/OCS are a lot more verbose than for the other schedulers. We should try to be consistent here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that I should remove helpfull comments?

Copy link
Owner

@mschubert mschubert Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make them a bit more concise. Can you make it more in the style of the SGE docs?


Set the following options in your _R_ session that will submit jobs:

```{r eval=FALSE}
options(
clustermq.scheduler = "gcs",
clustermq.template = "/path/to/file/below" # if using your own template
)
```

To supply your own template, save the contents below with any desired changes
to a file and have `clustermq.template` point to it.

```{sh eval=FALSE}
# Name of the job visible in GCS
#$ -terse # Show job ID
#$ -N {{ job_name }} # Job name
#$ -j y # Combine stdout/stderr into one file
#$ -o {{ log_file | /dev/null }} # Output file
#$ -cwd # Use cwd as working directory
#$ -V # Export all environment variables to the job
# Depending on security settings might require a
# cluster manager to set ENABLE_SUBMIT_LIB_PATH=1 as
# *qmaster_param* to export *LD_LIBRARY_PATH*
#$ -t 1-{{ n_jobs }} # Spawns workload as tasks of an array job
#$ -pe mytestpe {{ cores | 1 }} # Allocate one slot per task
#$ -bunit C # Allocate *power core(s)* per slot
#$ -bamount {{ threads | 1 }} # Allocate *one* power core per slot
#$ -bstrategy packed # *Pack* cores on a host to share NUMA nodes and caches
#$ -btype host # Bind per host
#$ -binstance set # Use HWLOC to bind cores
#$ -l mem_free={{ memory | 1073741824 }},h_rt={{ walltime | 3600 }},q=all.q # Resource requests and limits
#$ -ac application=clustermq,hostname={{ master }} # Tag the job

ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
```

In this file, `#$-*` defines command-line arguments to the `qsub` program.

* Used objects (mytestpe, all.q) in the template example are default objects
available after the default installation of GCS. Adapt the template to your
local setup if these objects are not available or if you want to use other
ones.
* The mytestpe is limited to 5 slots as default. Increase as appropriate for
your cluster and needs.
* For other options, see the [qsub man page](https://github.com/hpc-gridware/clusterscheduler/blob/master/doc/markdown/man/man1/submit.include.md)
* Find more detailed documentation on the options in the GCS manuals part of
your product installation ($SGE_ROOT/doc/pdf)
* Do not change the identifiers in curly braces (`{{ ... }}`), as they are
used to fill in the right variables.

Once this is done, the package will use your settings and no longer warn you of
the missing options.


### OCS

Set the following options in your _R_ session that will submit jobs:

```{r eval=FALSE}
options(
clustermq.scheduler = "ocs",
clustermq.template = "/path/to/file/below" # if using your own template
)
```

To supply your own template, save the contents below with any desired changes
to a file and have `clustermq.template` point to it.

```{sh eval=FALSE}
# Name of the job visible in OCS
#$ -terse # Show job ID
#$ -N {{ job_name }} # Job name
#$ -j y # Combine stdout/stderr into one file
#$ -o {{ log_file | /dev/null }} # Output file
#$ -cwd # Use cwd as working directory
#$ -V # Export all environment variables to the job
# Depending on security settings might require a
# cluster manager to set ENABLE_SUBMIT_LIB_PATH=1 as
# *qmaster_param* to export *LD_LIBRARY_PATH*
#$ -t 1-{{ n_jobs }} # Spawns workload as tasks of an array job
#$ -pe mytestpe {{ cores | 1 }} # Allocate one slot per task
#$ -bunit C # Allocate *power core(s)* per slot
#$ -bamount {{ threads | 1 }} # Allocate *one* power core per slot
#$ -l mem_free={{ memory | 1073741824 }},h_rt={{ walltime | 3600 }},q=all.q # Resource requests and limits
#$ -ac application=clustermq,hostname={{ master }} # Tag the job

ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
```

In this file, `#$-*` defines command-line arguments to the `qsub` program.

* Used objects (mytestpe, all.q) in the template example are default objects
available after the default installation of GCS. Adapt the template to your
local setup if these objects are not available or if you want to use other
ones.
* The mytestpe is limited to 5 slots as default. Increase as appropriate for
your cluster and needs.
* For other options, see the [qsub man page](https://github.com/hpc-gridware/clusterscheduler/blob/master/doc/markdown/man/man1/submit.include.md)
* Do not change the identifiers in curly braces (`{{ ... }}`), as they are
used to fill in the right variables.

Once this is done, the package will use your settings and no longer warn you of
the missing options.

### SGE

Set the following options in your _R_ session that will submit jobs:
Expand Down