Fixing CUDA on Slurm

I am on a Slurm GPU computation node, but face:

```bash
nvcc
```
```
bash: nvcc: command not found
```

how can I get `nvcc` to work?

Apr 27, 9:06pm

The error bash: nvcc: command not found means the NVIDIA CUDA Compiler (nvcc) is not in your PATH or is not installed on the node you are using.

How to Fix This on a Slurm GPU Node

1. Check if CUDA is Installed

Run:

bash
ls /usr/local/cuda*

bash
find / -name "nvcc" 2>/dev/null

If CUDA is installed, you’ll see paths like /usr/local/cuda-11.8/bin/nvcc.

2. Load the CUDA Module

Most Slurm clusters use environment modules to manage software. Try:

bash
module avail

Look for a CUDA module (e.g., cuda/11.8, cuda/12.1). Then load it:

bash
module load cuda/11.8

Replace 11.8 with the version available on your cluster.

3. Add CUDA to Your PATH Manually

If you found nvcc but it’s not in your PATH, add it:

bash
export PATH=/usr/local/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH

Again, replace 11.8 with your CUDA version.

4. Verify

Check if nvcc is now available:

bash
which nvcc
nvcc --version

5. If CUDA is Not Installed

Ask your cluster admin to install CUDA on the GPU nodes.

Or, use a container (e.g., Singularity/Apptainer) with CUDA pre-installed:

bash
singularity exec --nv /path/to/cuda-container.img nvcc --version

6. Slurm-Specific Tip

If you’re submitting a job, add the module load command to your Slurm script:

bash
#!/bin/bash
#SBATCH --gres=gpu:1
module load cuda/11.8
nvcc --version

Apr 27, 9:06pm

This is a shared conversation and may include unverified content that doesn't reflect Mistral AI's views. Login to resume chatting.