MIG on DGX Station

15 MIN

Enable and configure Multi-Instance GPU (MIG) on DGX Station with GB300 Ultra (B300 GPUs)

SymptomCauseFix
nvidia-smi -mig 1 fails or "MIG mode not supported"Driver too old or GPU not MIG-capableUse a driver version that supports MIG on GB300 (see MIG User Guide for supported versions). Check nvidia-smi -q for driver and GPU model. Update the driver if it is too old.
"In use by another client" when running -mig 1, -cgi, or -mig 0GPU is held by another process or MIG instances still existFor enable/create: Stop all GPU processes (desktop, VLLM, nvsm_core, nvidia-pe, nv-hostengine, etc.). Run sudo fuser -v /dev/nvidia* to see what is using the GPUs; stop those processes and retry. For disable: You must destroy all MIG instances first: run sudo nvidia-smi mig -dci -i N then sudo nvidia-smi mig -dgi -i N for each GPU index N that has instances, then run sudo nvidia-smi -mig 0.
nvidia-smi mig -cgi ... -C -i N fails (e.g. "Invalid combination")Profile combination exceeds GPU capacity or invalid IDsRun nvidia-smi mig -lgip -i N and use only listed profile IDs. Ensure the sum of instance sizes does not exceed the GB300's capacity for that GPU.
MIG instances not visible after creationInstances not created or wrong GPU indexRun nvidia-smi -L and sudo nvidia-smi mig -lgi to confirm. Re-run the -cgi commands for the correct -i <gpu_index>.
App doesn't see MIG device when using CUDA_VISIBLE_DEVICES=MIG-<uuid>Wrong UUID or app not using CUDA_VISIBLE_DEVICESGet UUIDs from nvidia-smi -L. Export CUDA_VISIBLE_DEVICES=MIG-<uuid> in the same shell before launching the app.
"Insufficient Permissions" when running nvidia-smi mig -lgi or -lciListing instances requires rootUse sudo nvidia-smi mig -lgi and sudo nvidia-smi mig -lci.
After nvidia-smi -mig 0, NVLink or fabric issues on DGX/HGXFabric Manager not re-initializingEnsure Fabric Manager is running after disabling MIG: sudo systemctl status nvidia-fabricmanager; start if needed with sudo systemctl start nvidia-fabricmanager. See MIG User Guide for details.
Permission denied when running nvidia-smi -mig or mig -cgiNeed root for MIG operationsUse sudo for nvidia-smi -mig 1/0, nvidia-smi mig -cgi ... -C, -dci, and -dgi.

MIG reconfiguration (day-2 operations)

To change the MIG layout (e.g. add or remove instances, or switch profiles), destroy existing instances on the affected GPU(s), then create the new layout:

  1. Destroy compute instances and GPU instances on each GPU you want to reconfigure (replace N with the GPU index):
    sudo nvidia-smi mig -dci -i N
    sudo nvidia-smi mig -dgi -i N
    
  2. Create the new layout with sudo nvidia-smi mig -cgi <profile_ids> -C -i N as in the Instructions (Step 4).

Workloads using the old MIG UUIDs must be stopped before destroying instances; they will need to be restarted with the new UUIDs from nvidia-smi -L after recreation.

Profile selection guidance

Profile (typical name)Use case
1g.35gb (ID 19)Small inference, dev/test, many concurrent small jobs
1g.70gb (ID 15)Slightly larger inference or light training
2g.70gb (ID 14)Medium inference or small training
3g.139gb (ID 9)Larger inference or medium training
4g.139gb (ID 5)Heavy inference or moderate training
7g.278gb (ID 0)Full-GPU as single MIG instance; max memory per partition

Exact profile names may vary by driver (e.g. 1g.34gb vs 1g.35gb); use the profile IDs from nvidia-smi mig -lgip -i 0 in your -cgi commands.

Post-disable verification

After running sudo nvidia-smi -mig 0, confirm MIG is fully disabled:

nvidia-smi -q | grep -A2 "MIG Mode"

Expected output should show Current: Disabled for each GPU. If you still see MIG devices in nvidia-smi -L, destroy any remaining instances with -dci/-dgi per GPU, then run -mig 0 again.