LSF executor does not respect LSF_UNIT_FOR_LIMITS in lsf.conf #5182

d-callan · 2024-07-29T13:28:14Z

Bug report

Expected behavior and actual behavior

Jobs submit on an LSF cluster should respect the value for LSF_UNIT_FOR_LIMITS in lsf.conf, per #1124 .. However, running on a cluster where this unit is set to MB, for a task asking for 80 MB, sees a header in .command.run files like the following:

#BSUB -M 81920
#BSUB -R "select[mem>=81920] rusage[mem=80]"

Steps to reproduce the problem

On an LSF cluster with a non-default setting for LSF_UNIT_FOR_LIMITS, i attempted to run an nf-core pipeline..

nextflow run nf-core/metatdenovo -profile singularity,test -outdir out

Program output

The cluster fails to start jobs, saying ive requested more resources than the queue allows.

Environment

Nextflow version: Ive tried 23.10.1 and 24.04.3
Java version: 11.0.1
Operating system: Linux
Bash version: GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)

The text was updated successfully, but these errors were encountered:

d-callan · 2024-07-29T14:51:34Z

possibly crazy question though.. wondering if there is a way i can work around this in the meantime of a fix? im kind of stuck as things are.

d-callan · 2024-07-29T20:50:27Z

as i investigate more, it seems like this is due to some odd configuration on my cluster. i cant run nextflow directly on the head node, where the correct lsf.conf exists. and for whatever reason, the lsf.conf file on the worker nodes is not consistent w the head node. ive tried to ask the admins about it, and they are.... something less than helpful. i think id like to amend this ticket to a feature request:

to be able to explicitly override this unit

bentsherman · 2024-07-30T18:21:36Z

This LSF config setting is read here:

nextflow/modules/nextflow/src/main/groovy/nextflow/executor/LsfExecutor.groovy

Lines 315 to 320 in 2fb5bc0

    
           // lsf mem unit 
        
           // https://www.ibm.com/support/knowledgecenter/en/SSETD4_9.1.3/lsf_config_ref/lsf.conf.lsf_unit_for_limits.5.html 
        
           if( conf.get('LSF_UNIT_FOR_LIMITS') ) { 
        
               memUnit = usageUnit = conf.get('LSF_UNIT_FOR_LIMITS') 
        
               log.debug "[LSF] Detected lsf.conf LSF_UNIT_FOR_LIMITS=$memUnit" 
        
           }

And the memory options are defined here:

nextflow/modules/nextflow/src/main/groovy/nextflow/executor/LsfExecutor.groovy

Lines 92 to 103 in 2fb5bc0

    
           if( task.config.getMemory() ) { 
        
               def mem = task.config.getMemory() 
        
               // LSF mem limit can be both per-process and per-job 
        
               // depending a system configuration setting -- see https://www.ibm.com/support/knowledgecenter/SSETD4_9.1.3/lsf_config_ref/lsf.conf.lsb_job_memlimit.5.dita 
        
               // When per-process is used (default) the amount of requested memory 
        
               // is divided by the number of used cpus (processes) 
        
               def mem1 = ( task.config.getCpus() > 1 && !perJobMemLimit ) ? mem.div(task.config.getCpus() as int) : mem 
        
               def mem2 = ( task.config.getCpus() > 1 && perTaskReserve ) ? mem.div(task.config.getCpus() as int) : mem 
        
               result << '-M' << String.valueOf(mem1.toUnit(memUnit)) 
        
               result << '-R' << "select[mem>=${mem.toUnit(memUnit)}] rusage[mem=${mem2.toUnit(usageUnit)}]".toString() 
        
           }

So you can see how the various config options affect the final submit options. Maybe you can use the executor.perJobMemLimit or executor.perTaskReserve options to get what you need

d-callan · 2024-08-09T14:26:00Z

thanks @bentsherman for the info. i had another thought recently.. what do you think of explicitly adding units to the submission string? so that nextflow produces something like bsub -M 50000KB rather than bsub -M 50000? if doable, that seems like it should make this more robust, make my problem go away, and add clarity without changing existing behavior/ features?

bentsherman · 2024-08-09T15:39:53Z

I didn't realize that was an option. It would make things much simpler. Can a unit be specified for all of those memory settings?

d-callan · 2024-08-09T16:27:03Z

hmm. good question. ive just now gone and tried to ask for an interactive node on my cluster like bsub -M 4GB -R "select[mem>=8GB] rusage[mem=8GB]" -Is bash and nothing screamed at me or caught fire.. so that seems promising.

bentsherman · 2024-08-09T17:10:48Z

Okay I see it is documented here: https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=requirements-resource-requirement-strings#vnmbvn__title__3

Assuming this syntax has been supported for a while, it should be fine for Nextflow to use it. I will draft a PR

bentsherman added the executor/lsf label Jul 29, 2024

bentsherman linked a pull request Aug 9, 2024 that will close this issue

Use explicit memory units for LSF executor #5217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSF executor does not respect LSF_UNIT_FOR_LIMITS in lsf.conf #5182

LSF executor does not respect LSF_UNIT_FOR_LIMITS in lsf.conf #5182

d-callan commented Jul 29, 2024

d-callan commented Jul 29, 2024

d-callan commented Jul 29, 2024

bentsherman commented Jul 30, 2024

d-callan commented Aug 9, 2024

bentsherman commented Aug 9, 2024

d-callan commented Aug 9, 2024

bentsherman commented Aug 9, 2024

LSF executor does not respect LSF_UNIT_FOR_LIMITS in lsf.conf #5182

LSF executor does not respect LSF_UNIT_FOR_LIMITS in lsf.conf #5182

Comments

d-callan commented Jul 29, 2024

Bug report

Expected behavior and actual behavior

Steps to reproduce the problem

Program output

Environment

d-callan commented Jul 29, 2024

d-callan commented Jul 29, 2024

bentsherman commented Jul 30, 2024

d-callan commented Aug 9, 2024

bentsherman commented Aug 9, 2024

d-callan commented Aug 9, 2024

bentsherman commented Aug 9, 2024