Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSF executor does not respect LSF_UNIT_FOR_LIMITS in lsf.conf #5182

Open
d-callan opened this issue Jul 29, 2024 · 7 comments · May be fixed by #5217
Open

LSF executor does not respect LSF_UNIT_FOR_LIMITS in lsf.conf #5182

d-callan opened this issue Jul 29, 2024 · 7 comments · May be fixed by #5217

Comments

@d-callan
Copy link

Bug report

Expected behavior and actual behavior

Jobs submit on an LSF cluster should respect the value for LSF_UNIT_FOR_LIMITS in lsf.conf, per #1124 .. However, running on a cluster where this unit is set to MB, for a task asking for 80 MB, sees a header in .command.run files like the following:

#BSUB -M 81920
#BSUB -R "select[mem>=81920] rusage[mem=80]"

Steps to reproduce the problem

On an LSF cluster with a non-default setting for LSF_UNIT_FOR_LIMITS, i attempted to run an nf-core pipeline..

nextflow run nf-core/metatdenovo -profile singularity,test -outdir out

Program output

The cluster fails to start jobs, saying ive requested more resources than the queue allows.

Environment

  • Nextflow version: Ive tried 23.10.1 and 24.04.3
  • Java version: 11.0.1
  • Operating system: Linux
  • Bash version: GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)
@d-callan
Copy link
Author

possibly crazy question though.. wondering if there is a way i can work around this in the meantime of a fix? im kind of stuck as things are.

@d-callan
Copy link
Author

as i investigate more, it seems like this is due to some odd configuration on my cluster. i cant run nextflow directly on the head node, where the correct lsf.conf exists. and for whatever reason, the lsf.conf file on the worker nodes is not consistent w the head node. ive tried to ask the admins about it, and they are.... something less than helpful. i think id like to amend this ticket to a feature request:

to be able to explicitly override this unit

@bentsherman
Copy link
Member

This LSF config setting is read here:

// lsf mem unit
// https://www.ibm.com/support/knowledgecenter/en/SSETD4_9.1.3/lsf_config_ref/lsf.conf.lsf_unit_for_limits.5.html
if( conf.get('LSF_UNIT_FOR_LIMITS') ) {
memUnit = usageUnit = conf.get('LSF_UNIT_FOR_LIMITS')
log.debug "[LSF] Detected lsf.conf LSF_UNIT_FOR_LIMITS=$memUnit"
}

And the memory options are defined here:

if( task.config.getMemory() ) {
def mem = task.config.getMemory()
// LSF mem limit can be both per-process and per-job
// depending a system configuration setting -- see https://www.ibm.com/support/knowledgecenter/SSETD4_9.1.3/lsf_config_ref/lsf.conf.lsb_job_memlimit.5.dita
// When per-process is used (default) the amount of requested memory
// is divided by the number of used cpus (processes)
def mem1 = ( task.config.getCpus() > 1 && !perJobMemLimit ) ? mem.div(task.config.getCpus() as int) : mem
def mem2 = ( task.config.getCpus() > 1 && perTaskReserve ) ? mem.div(task.config.getCpus() as int) : mem
result << '-M' << String.valueOf(mem1.toUnit(memUnit))
result << '-R' << "select[mem>=${mem.toUnit(memUnit)}] rusage[mem=${mem2.toUnit(usageUnit)}]".toString()
}

So you can see how the various config options affect the final submit options. Maybe you can use the executor.perJobMemLimit or executor.perTaskReserve options to get what you need

@d-callan
Copy link
Author

d-callan commented Aug 9, 2024

thanks @bentsherman for the info. i had another thought recently.. what do you think of explicitly adding units to the submission string? so that nextflow produces something like bsub -M 50000KB rather than bsub -M 50000? if doable, that seems like it should make this more robust, make my problem go away, and add clarity without changing existing behavior/ features?

@bentsherman
Copy link
Member

I didn't realize that was an option. It would make things much simpler. Can a unit be specified for all of those memory settings?

@d-callan
Copy link
Author

d-callan commented Aug 9, 2024

hmm. good question. ive just now gone and tried to ask for an interactive node on my cluster like bsub -M 4GB -R "select[mem>=8GB] rusage[mem=8GB]" -Is bash and nothing screamed at me or caught fire.. so that seems promising.

@bentsherman
Copy link
Member

Okay I see it is documented here: https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=requirements-resource-requirement-strings#vnmbvn__title__3

Assuming this syntax has been supported for a while, it should be fine for Nextflow to use it. I will draft a PR

@bentsherman bentsherman linked a pull request Aug 9, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants