azslurm scale fails with ModuleNotFoundError: No module named 'slurmcc' and NFS not mounted on scheduler VM

Question

azslurm scale fails with ModuleNotFoundError: No module named 'slurmcc' and NFS not mounted on scheduler VM

Harish Gudla 25

I’m trying to configure/use autoscaling on an Azure HPC Slurm cluster, but on the scheduler VM azslurm scale fails with a missing Python module, and it also looks like the expected NFS share(s) are not mounted.

Environment:

Node: compular-scheduler (Slurm scheduler VM on Azure)

Running as root via sudo -i

Disk and mounts look like this:

root@compular-scheduler:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        62G   36G   27G  58% /
tmpfs           7.9G     0  7.9G   0% /dev/shm
tmpfs           3.2G  1.1M  3.2G   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sda15      105M  6.1M   99M   6% /boot/efi
tmpfs           1.6G  4.0K  1.6G   1% /run/user/1000

I was expecting to see one or more NFS-mounted filesystems for the shared storage (e.g. /shared, /apps, or similar), but they do not appear in df -h on the scheduler.

When I run azslurm scale I get: <private data>

The azslurm script points to a virtual environment under /opt/azurehpc/slurm/venv and imports slurmcc:

$ head /opt/azurehpc/slurm/venv/bin/azslurm
#!/opt/azurehpc/slurm/venv/bin/python

import os

if "SCALELIB_LOG_USER" not in os.environ:
    os.environ["SCALELIB_LOG_USER"] = "slurm"
if "SCALELIB_LOG_GROUP" not in os.environ:
    os.environ["SCALELIB_LOG_GROUP"] = "slurm"

from slurmcc.cli import main

Python in that venv is:

$ which python
/opt/azurehpc/slurm/venv/bin/python

But slurmcc does not seem to be installed there:

$ python -m pip list | grep -i slurm
# (no output)

$ python -m pip show slurmcc
WARNING: Package(s) not found: slurmcc

So it looks like the azslurm CLI is present, but the underlying slurmcc package is missing from /opt/azurehpc/slurm/venv. At the same time, the scheduler VM does not show any NFS-mounted shared storage in df -h, which might indicate the Slurm/Azure integration or provisioning did not complete correctly.

My questions:

What is the correct way to (re)install or repair the slurmcc package and azslurm environment on an Azure HPC Slurm scheduler VM?
Is there an official script/extension or documented procedure to re-run the Azure Slurm connector / autoscaling installation on an existing scheduler without breaking the cluster?
Should the scheduler normally have NFS-mounted shared storage visible in df -h (e.g. /shared, /apps, or similar)? If yes, what is the recommended way to verify and/or re-mount the expected NFS shares on the scheduler node (As I have important data on the nfs disk)?

Any guidance on restoring a working azslurm scale command and ensuring the scheduler’s NFS mounts are correctly configured would be appreciated.I’m trying to configure/use autoscaling on an Azure HPC Slurm cluster, but on the scheduler VM azslurm scale fails with a missing Python module, and it also looks like the expected NFS share(s) are not mounted.

Himanshu Shekhar 1,860 Reputation points Microsoft External Staff Moderator

2025-11-12T14:38:03.99+00:00

Hello @Harish Gudla - Thank you for reaching Microsoft QnA platform

What is the official recommended method or script provided by Azure CycleCloud to install, repair, or reconfigure the slurmcc Python package and the azslurm autoscaling CLI environment on the scheduler VM?

Should the scheduler VM normally have NFS-mounted shared storage like /shared or /apps, and what is the documented procedure to verify and safely remount these NFS shares if they are missing?

Is there an official or supported way to re-run or repair the Azure Slurm autoscaling installation or connector on an existing scheduler VM without disrupting existing cluster workloads?

Are there known prerequisites, dependencies, or common misconfigurations that cause missing Python modules like slurmcc and failure of autoscaling commands such as azslurm scale on Azure HPC Slurm scheduler nodes?
Harish Gudla 25 Reputation points

2025-11-14T15:00:24.4266667+00:00

Hi,

Thank you for the update and for looking into my questions.

I wanted to confirm that I was able to resolve the slurmcc and azslurm issues on the scheduler node by downloading the cyclecloud-slurm Python package from the Azure GitHub repository (https://github.com/Azure/cyclecloud-slurm) and manually installing the slurmcc module into the azslurm venv. After reinstalling, the azslurm CLI is now functioning normally.

Regarding the missing NFS-mounted shared storage, both /shared and /sched were not automatically mounted on restarting the VM. I manually remounted the volumes and the scheduler is now operating as expected.

Answer accepted by question author

1 additional answer

Your answer

Himanshu Shekhar 1,860 Reputation points Microsoft External Staff Moderator

2025-11-12T14:38:03.99+00:00

Hello @Harish Gudla - Thank you for reaching Microsoft QnA platform

What is the official recommended method or script provided by Azure CycleCloud to install, repair, or reconfigure the slurmcc Python package and the azslurm autoscaling CLI environment on the scheduler VM?

Should the scheduler VM normally have NFS-mounted shared storage like /shared or /apps, and what is the documented procedure to verify and safely remount these NFS shares if they are missing?

Is there an official or supported way to re-run or repair the Azure Slurm autoscaling installation or connector on an existing scheduler VM without disrupting existing cluster workloads?

Are there known prerequisites, dependencies, or common misconfigurations that cause missing Python modules like slurmcc and failure of autoscaling commands such as azslurm scale on Azure HPC Slurm scheduler nodes?
Harish Gudla 25 Reputation points

2025-11-14T15:00:24.4266667+00:00

Hi,

Thank you for the update and for looking into my questions.

I wanted to confirm that I was able to resolve the slurmcc and azslurm issues on the scheduler node by downloading the cyclecloud-slurm Python package from the Azure GitHub repository (https://github.com/Azure/cyclecloud-slurm) and manually installing the slurmcc module into the azslurm venv. After reinstalling, the azslurm CLI is now functioning normally.

Regarding the missing NFS-mounted shared storage, both /shared and /sched were not automatically mounted on restarting the VM. I manually remounted the volumes and the scheduler is now operating as expected.

Answer 1

User was able to resolve the slurmcc and azslurm issues on the scheduler node by downloading the cyclecloud-slurm Python package from the Azure GitHub repository (https://github.com/Azure/cyclecloud-slurm) and manually installing the slurmcc module into the azslurm venv. After reinstalling, the azslurm CLI is now functioning normally.

Regarding the missing NFS-mounted shared storage, both /shared and /sched were not automatically mounted on restarting the VM. I manually remounted the volumes and the scheduler is now operating as expected.

Answer 2

To address the issues you're facing with the azslurm scale command and the missing NFS mounts on your Azure HPC Slurm scheduler VM, here are some steps and considerations:

Reinstall or Repair the slurmcc Package:
- Since the slurmcc package is missing from your virtual environment, you can try reinstalling it. First, ensure that you have the necessary permissions and access to the package repository. You can typically install it using pip:
```
     source /opt/azurehpc/slurm/venv/bin/activate
     pip install slurmcc
```
- If slurmcc is part of a larger package or suite, you may need to refer to the specific installation instructions for Azure HPC Slurm to ensure all dependencies are met.
Re-run Azure Slurm Connector / Autoscaling Installation:
- There isn't a specific documented procedure mentioned in the context for re-running the Azure Slurm connector or autoscaling installation without breaking the cluster. However, you can check the Azure documentation or support resources for guidance on reinstalling or repairing the Slurm setup. If you have the original installation scripts or commands, running them again may help.
NFS-Mounted Shared Storage:
- Yes, the scheduler should normally have NFS-mounted shared storage visible in df -h, such as /shared or /apps. If these mounts are missing, it indicates that the NFS shares may not have been set up correctly during the cluster configuration.
- To verify and re-mount the expected NFS shares, you can check the NFS server configuration and ensure that the shares are correctly exported. You can also manually mount them using:
```
     mount -t nfs <nfs-server-ip>:/path/to/share /shared
```
- Ensure that the NFS server is reachable and that the appropriate firewall rules are in place to allow NFS traffic.

By following these steps, you should be able to restore functionality to the azslurm scale command and ensure that the NFS mounts are correctly configured on your scheduler node.

References:

Share via

azslurm scale fails with ModuleNotFoundError: No module named 'slurmcc' and NFS not mounted on scheduler VM

1 additional answer

Your answer