MIG on DGX Station
Enable and configure Multi-Instance GPU (MIG) on DGX Station with GB300 Ultra (B300 GPUs)
| Symptom | Cause | Fix |
|---|---|---|
nvidia-smi -mig 1 fails or "MIG mode not supported" | Driver too old or GPU not MIG-capable | Use a driver version that supports MIG on GB300 (see MIG User Guide for supported versions). Check nvidia-smi -q for driver and GPU model. Update the driver if it is too old. |
"In use by another client" when running -mig 1, -cgi, or -mig 0 | GPU is held by another process or MIG instances still exist | For enable/create: Stop all GPU processes (desktop, VLLM, nvsm_core, nvidia-pe, nv-hostengine, etc.). Run sudo fuser -v /dev/nvidia* to see what is using the GPUs; stop those processes and retry. For disable: You must destroy all MIG instances first: run sudo nvidia-smi mig -dci -i N then sudo nvidia-smi mig -dgi -i N for each GPU index N that has instances, then run sudo nvidia-smi -mig 0. |
nvidia-smi mig -cgi ... -C -i N fails (e.g. "Invalid combination") | Profile combination exceeds GPU capacity or invalid IDs | Run nvidia-smi mig -lgip -i N and use only listed profile IDs. Ensure the sum of instance sizes does not exceed the GB300's capacity for that GPU. |
| MIG instances not visible after creation | Instances not created or wrong GPU index | Run nvidia-smi -L and sudo nvidia-smi mig -lgi to confirm. Re-run the -cgi commands for the correct -i <gpu_index>. |
| App doesn't see MIG device when using CUDA_VISIBLE_DEVICES=MIG-<uuid> | Wrong UUID or app not using CUDA_VISIBLE_DEVICES | Get UUIDs from nvidia-smi -L. Export CUDA_VISIBLE_DEVICES=MIG-<uuid> in the same shell before launching the app. |
"Insufficient Permissions" when running nvidia-smi mig -lgi or -lci | Listing instances requires root | Use sudo nvidia-smi mig -lgi and sudo nvidia-smi mig -lci. |
After nvidia-smi -mig 0, NVLink or fabric issues on DGX/HGX | Fabric Manager not re-initializing | Ensure Fabric Manager is running after disabling MIG: sudo systemctl status nvidia-fabricmanager; start if needed with sudo systemctl start nvidia-fabricmanager. See MIG User Guide for details. |
| Permission denied when running nvidia-smi -mig or mig -cgi | Need root for MIG operations | Use sudo for nvidia-smi -mig 1/0, nvidia-smi mig -cgi ... -C, -dci, and -dgi. |
MIG reconfiguration (day-2 operations)
To change the MIG layout (e.g. add or remove instances, or switch profiles), destroy existing instances on the affected GPU(s), then create the new layout:
- Destroy compute instances and GPU instances on each GPU you want to reconfigure (replace
Nwith the GPU index):sudo nvidia-smi mig -dci -i N sudo nvidia-smi mig -dgi -i N - Create the new layout with
sudo nvidia-smi mig -cgi <profile_ids> -C -i Nas in the Instructions (Step 4).
Workloads using the old MIG UUIDs must be stopped before destroying instances; they will need to be restarted with the new UUIDs from nvidia-smi -L after recreation.
Profile selection guidance
| Profile (typical name) | Use case |
|---|---|
| 1g.35gb (ID 19) | Small inference, dev/test, many concurrent small jobs |
| 1g.70gb (ID 15) | Slightly larger inference or light training |
| 2g.70gb (ID 14) | Medium inference or small training |
| 3g.139gb (ID 9) | Larger inference or medium training |
| 4g.139gb (ID 5) | Heavy inference or moderate training |
| 7g.278gb (ID 0) | Full-GPU as single MIG instance; max memory per partition |
Exact profile names may vary by driver (e.g. 1g.34gb vs 1g.35gb); use the profile IDs from nvidia-smi mig -lgip -i 0 in your -cgi commands.
Post-disable verification
After running sudo nvidia-smi -mig 0, confirm MIG is fully disabled:
nvidia-smi -q | grep -A2 "MIG Mode"
Expected output should show Current: Disabled for each GPU. If you still see MIG devices in nvidia-smi -L, destroy any remaining instances with -dci/-dgi per GPU, then run -mig 0 again.