-
Notifications
You must be signed in to change notification settings - Fork 101
Configuration
First, you can use either .cfg or .json for configuration. Keys and section-names are case-sensitive. (Before April 2018, they were case-insensitive.)
pypeflow-2.0.0 offers a new, more flexible way to configure job-submission via pypeflow.
You should be able to quite, alter any of these, and resume to see the new values take effect. (This was a long-standing request from David Gordon.)
The job.defaults section should have the basics, and any defaults. Your have several choices.
[job.defaults]
njobs = 32
[job.step.cns]
njobs = 8That would allow up to 32 simultaneous jobs in most steps, but only 8 during falcon-consensus.
This is simplest, and the first thing you should ever try:
[job.defaults]
pwatcher_type = blocking
submit = /bin/bash -c "${JOB_SCRIPT}"
[General]
pwatcher_type = blocking
# Because of a bug, this is needed in the "General" section, but soon
# it will work from the "job.defaults" too, which is preferred.If you want to separate stderr/stdout into each task-dir, for isolated debugging:
[job.defaults]
pwatcher_type = blocking
submit = /bin/bash -c "${JOB_SCRIPT}" > "${JOB_STDOUT}" 2> "${JOB_STDERR}"
[General]
pwatcher_type = blockingNote that there is no &; these are foreground processes. Each will block the thread which calls one.
It is easy to construct such a string for your own job-submission system, as long as the system
provides a way to do "blocking" calls. E.g. SGE uses -sync y (and -V to pass the shell environment):
[job.defaults]
pwatcher_type = blocking
submit = qsub -S /bin/bash -sync y -V \
-q ${JOB_QUEUE} \
-N ${JOB_NAME} \
-o "${JOB_STDOUT}" \
-e "${JOB_STDERR}" \
-pe smp ${NPROC} \
"${JOB_SCRIPT}"
JOB_QUEUE = myqueue
MB = 4000
NPROC = 4By convention, we use JOB_* for most variables. However, NPROC and MB are special; those limit the resources, so the process itself will be informed. Aside from those, we generate the following automatically:
JOB_STDOUTJOB_STDERRJOB_SCRIPTJOB_NAME- (Some older aliases are also supported.)
You can provide default values for any of the substitutuion variables. (You can even define your own, but please use all-upper case.) And you can override these in the step-specific sections.
(Btw, we have had trouble with -l h_vmem=${MB}M.)
[job.step.cns]
NPROC = 24
MB = 2000Currently, the falcon "steps" are:
job.step.dustjob.step.dajob.step.lajob.step.cnsjob.step.pdajob.step.pla-
job.step.asm(akajob.step.fc)
For other examples, see pypeFLOW configuration.
This is fairly normal. We submit jobs somehow, and we poll the filesystem to learn when each job is done.
This is a bit more convenient because we provide useful defaults for various job-submission systems. (We cannot do this generically because each system has a different way of "killing" a job early.)
[job.defaults]
pwatcher_type = fs_based
job_type = sge # choices: local/sge/lsf/pbs/slurm/torque/etc?
JOB_QUEUE = myqueue[job.defaults]
pwatcher_type = fs_based
job_type = localThis should be used before using sge etc, since it will test your workflow, independent of any job-submission problems.
It uses & to put simple processes into the background.
If you do not like our submit and kill strings, you can provide your own in [job.defaults].
Variable substitutions are the same as for the blocking pwatcher (above).
[job.defaults]
submit = qsub -S /bin/bash --special-flags -q myqueue -N ${JOB_NAME} "${JOB_SCRIPT}"
kill = qdel -j ${JOB_NAME}It's tricky. And we don't yet have a dry-run mode. But it lets you do whatever you want.
Note: We do not yet have a way to learn the job-number from the submission command, so job-killing is subject to name-collisions. This is one reason why the "blocking" calls are easier to support.
In the past, you would specify overrides for each section.
[General]
default_concurrent_jobs = 32
cns_concurrent_jobs = 8That would allow up to 32 simultaneous jobs in most steps, but only 8 during falcon-consensus.
[General]
job_queue = mydefaultqueue
sge_option_da = -pe smp 8 -q queueA
sge_option_la = -pe smp 2 -q queueA
sge_option_cns = -pe smp 8 -q queueA
sge_option_pda = -pe smp 8 -q queueB
sge_option_pla = -pe smp 2 -q queueB
sge_option_fc = -pe smp 24 -q queueBBecause we use Python ConfigParser, you could also do this:
[General]
job_queue = myqueue
sge_option_da = -pe smp 8 -q %(job_queue)Those still work. They are substituted into your "submit" string as ${JOB_OPTS}
if you do not provide JOB_OPTS yourself. But we recommend using the system above.
Why? Well, for one thing, the job needs to know how many processors were actually reserved for it. Otherwise, it could use whatever it wants. So hard-coded numbers are not helpful.
Also, it is far more flexible. You can set your own submission string, and you can pass-along whatever extra variables you need.
See also:
- https://github.com/PacificBiosciences/pypeFLOW/wiki/configuration -- for general pypeflow configuration
- https://github.com/PacificBiosciences/FALCON/wiki/Options-Available -- for Falcon-specific configuration
![]()