Skip to content

Conversation

@SanderBorgmans
Copy link

@jan-janssen, thank you for the modular implementation. I had to fiddle around with the queues.yaml file before I got everything working. But if I am correct, an extra cluster property is added to the queues dict and an extra list object with all the clusters should be added. I fixed some minor errors and everything should work now. The commands are now parsed as strings instead of lists, since the swap commands requires the shell flag, which meant that the format flag of the queue command had to be adapted.

@SanderBorgmans SanderBorgmans mentioned this pull request Aug 28, 2019
@jan-janssen
Copy link
Member

Hi @SanderBorgmans I am sorry for the bug, I now added at least a very basic test to see it is working, but testing the functionality with the queuing system is always a bit difficult. I just have one more question, is there any reason why the SLURM adapter does not work for you? (I modified the tests to be using slurm in the modular adapter, maybe you can take a look at the current master - it still needs your changes to use subprocess with shell=True, but it should not be necessary to have another adapter gent.py and there should be no need to modify basic.py.

@SanderBorgmans
Copy link
Author

SanderBorgmans commented Aug 28, 2019

@jan-janssen I tought an extra wrapper was required because of the config["queue_type"] in ["GENT"] line in the queueadapter.py. Aside from that, the gent wrapper handles the convert_queue_status and get_job_id_from_output slightly different, as our HPC environment returns a slightly different output for squeue, and adds the cluster column for the queue table. Moreover, it checks whether the queue_status_output is empty, since it is possible for some users to not have a single job on some clusters.

@jan-janssen
Copy link
Member

@SanderBorgmans I would like to minimise the duplicated code, that is why I am a bit reluctant to add a second parser to communicate with slurm. In the current master branch I added the line https://github.com/pyiron/pysqa/blob/master/pysqa/modular.py#L10 to use slurm in the modular adapter and I added the option shell=True to convert lists to strings.

@SanderBorgmans
Copy link
Author

@jan-janssen This is fine for me, I will then just overwrite the slurm wrapper by my own wrapper and everything should be ok.

@jan-janssen
Copy link
Member

@SanderBorgmans I still have trouble to understand why we need two different adapters for slurm. But if it is necessary, that is fine for me, maybe you could at least derive it from the SlurmCommands class, so it is obvious which parts changed.

@SanderBorgmans
Copy link
Author

@jan-janssen That's a good idea. The different adapter is necessary because of https://github.com/pyiron/pysqa/blob/master/pysqa/wrapper/slurm.py#L39 and https://github.com/pyiron/pysqa/blob/master/pysqa/wrapper/slurm.py#L43 . When submitting a job with sbatch, the returned output looks like jobid;cluster, so that an extra split is necessary. Furthermore, the squeue command prints an extra line "CLUSTER: clustername", which is not dropped when specifying --no-header, and thus again an extra operation is required to drop this line. Aside from this both adapters are identical with the inclusion of the cluster column that I introduced before.

I also got errors from a possibly empty queue_status_output when splitlines is called. This occurred for cluster which had no jobs in the queue. Does this happen for you?

@jan-janssen
Copy link
Member

@jan-janssen That's a good idea. The different adapter is necessary because of https://github.com/pyiron/pysqa/blob/master/pysqa/wrapper/slurm.py#L39 and https://github.com/pyiron/pysqa/blob/master/pysqa/wrapper/slurm.py#L43 . When submitting a job with sbatch, the returned output looks like jobid;cluster, so that an extra split is necessary. Furthermore, the squeue command prints an extra line "CLUSTER: clustername", which is not dropped when specifying --no-header, and thus again an extra operation is required to drop this line. Aside from this both adapters are identical with the inclusion of the cluster column that I introduced before.

Ok, then it makes sense to have a different wrapper.

I also got errors from a possibly empty queue_status_output when splitlines is called. This occurred for cluster which had no jobs in the queue. Does this happen for you?

I have to admit our cluster still runs SGE so @dnoeger implemented the slurm adapter for their cluster. But it definitely makes sense to check the length of the output.

Can you modify the pull request accordingly, then we can merge it?

@jan-janssen
Copy link
Member

Does this work for you? #19

@SanderBorgmans
Copy link
Author

@jan-janssen Perfect, I was just about to push, but I had some git issues.

@jan-janssen
Copy link
Member

Nice, then I guess we can close this pull request and I am going to push a new release to pip and conda-forge .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants