On a Slurm computation node with 6 CPU cores:

```bash
srun --cpus-per-task=2 --ntasks=3 bash -c 'echo $SLURM_PROCID'
```

will hang until 6 CPU cores are available, while the 3 tasks are independent I would like one to start proceeding on the only 2 CPU cores available. How to do that?
Jun 3, 1:33pm

To achieve your goal—running 3 independent tasks with 2 CPUs each, but allowing tasks to start as soon as 2 CPUs are available (rather than waiting for all 6 CPUs to be free)—you need to avoid over-subscribing the node while still allowing task packing.

Solution: Use --ntasks-per-node and --cpus-per-task with --wait=0

By default, srun waits for all requested resources to be available before launching any tasks. To allow tasks to start as soon as their required CPUs are free, use the --wait=0 flag (or --no-kill if you want to ensure tasks don't get killed if resources are reallocated).

However, the most reliable way is to submit each task separately or use job arrays with resource constraints per task.


Option 1: Submit Tasks Individually

Run each task as a separate srun command, each requesting 2 CPUs:

bash
srun --cpus-per-task=2 bash -c 'echo $SLURM_PROCID' &
srun --cpus-per-task=2 bash -c 'echo $SLURM_PROCID' &
srun --cpus-per-task=2 bash -c 'echo $SLURM_PROCID' &
wait
  • Each srun will start as soon as 2 CPUs are free.
  • The & runs them in the background, and wait ensures the script waits for all to finish.

Option 2: Use a Job Array

If you want to manage them as a single job, use a job array with --array:

bash
sbatch --array=1-3 --cpus-per-task=2 --wrap="bash -c 'echo \$SLURM_ARRAY_TASK_ID'"
  • Each array task will request 2 CPUs and start as soon as resources are available.

Option 3: Use --ntasks-per-node with --overcommit

If you want to force Slurm to pack tasks onto the node (even if it risks oversubscription), use:

bash
srun --cpus-per-task=2 --ntasks=3 --overcommit bash -c 'echo $SLURM_PROCID'
  • Warning: This may oversubscribe the node if other jobs are running, but it will allow tasks to start as soon as 2 CPUs are free.

Key Takeaway

  • For strict resource isolation: Use Option 1 or 2 (individual srun or job arrays).
  • For packing tasks onto a node: Use Option 3 (--overcommit), but be aware of oversubscription risks.

Would you like help testing or refining one of these approaches?

Jun 3, 1:33pm

This is a shared conversation and may include unverified content that doesn't reflect Mistral AI's views. Login to resume chatting.

Vibe can make mistakes. Check answers. Learn more