llama-3.1-nemotron-nano-8b-v1 Model by NVIDIA

The following are key system requirements and supported features to consider when self-hosting the llama-3.1-nemotron-nano-8b-v1 model.

GPU Memory Requirements

Precision	Minimum GPU Memory	Recommended GPU Memory
bf16	16 GB	33 GB
fp8	8 GB	16 GB

Deploying this NIM with less than the recommended amount of GPU memory requires setting the environment variable NIM_RELAX_MEM_CONSTRAINTS=1

Feature	Supported
LoRA Customization	✅
Fine-tuning Customization	✅
Tool Calling	✅
TensorRT-LLM Local Engine Building	✅