llama-3.1-nemotron-nano-4b-v1.1 Model by NVIDIA

The following are key system requirements and supported features to consider when self-hosting the llama3.1-nemotron-nano-4b-v1.1 model.

GPU Memory Requirements

Precision	Minimum GPU Memory	Recommended GPU Memory
bf16	9 GB	26 GB
fp8	4 GB	13 GB

Deploying this NIM with less than the recommended amount of GPU memory requires setting the environment variable NIM_RELAX_MEM_CONSTRAINTS=1

Feature	Supported
LoRA Customization	✅
Fine-tuning Customization	✅
Tool Calling	✅
TensorRT-LLM Local Engine Building	✅