The message
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
typically appears when you use scancel on a Slurm job that is still running interactive tasks (e.g., via srun). This happens because scancel sends a termination signal to the job, but srun may still be waiting for the job step to clean up.
--signal=KILL (or -s KILL)By default, scancel sends a SIGTERM signal, which allows the job to clean up gracefully. To force an immediate kill, use the SIGKILL signal:
bashscancel -s KILL <job_id>
or
bashscancel --signal=KILL <job_id>
This bypasses the 32-second wait and terminates the job immediately.
--no-wait (if supported by your Slurm version)Some newer versions of Slurm support the --no-wait flag, which prevents scancel from waiting for the job to finish:
bashscancel --no-wait <job_id>
If the job has multiple steps, you can kill all of them at once:
bashscancel -s KILL -n <job_name> # Kill by job name scancel -s KILL -u <username> # Kill all jobs for a user
After canceling, verify the job is gone with:
bashsqueue -u $USER
or
bashsacct -j <job_id> --format=JobID,JobName,State,ExitCode
scancel sends a SIGTERM by default, allowing the job to clean up.SIGKILL (-s KILL) skips this wait.Try scancel -s KILL <job_id> and let me know if it works for you!
If you still see the message, your Slurm configuration might override this behavior—let me know your cluster's setup.
Vibe can make mistakes. Check answers. Learn more