The message
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
typically appears when you use scancel on a Slurm job that is still running interactive tasks (e.g., via srun). This happens because scancel sends a termination signal to the job, but srun may still be waiting for the job step to clean up.
--signal=KILL (or -s KILL)By default, scancel sends a SIGTERM signal, which allows the job to clean up gracefully. To force an immediate kill, use the SIGKILL signal:
bashscancel -s KILL <job_id>
or
bashscancel --signal=KILL <job_id>
This bypasses the 32-second wait and terminates the job immediately.
--no-wait (if supported by your Slurm version)Some newer versions of Slurm support the --no-wait flag, which prevents scancel from waiting for the job to finish:
bashscancel --no-wait <job_id>
If the job has multiple steps, you can kill all of them at once:
bashscancel -s KILL -n <job_name> # Kill by job name scancel -s KILL -u <username> # Kill all jobs for a user
After canceling, verify the job is gone with:
bashsqueue -u $USER
or
bashsacct -j <job_id> --format=JobID,JobName,State,ExitCode
scancel sends a SIGTERM by default, allowing the job to clean up.SIGKILL (-s KILL) skips this wait.Try scancel -s KILL <job_id> and let me know if it works for you!
If you still see the message, your Slurm configuration might override this behavior—let me know your cluster's setup.