llama-3.1-8b-instruct Model by Meta

The following are key system requirements and supported features to consider when self-hosting the llama-3.1-8b-instruct model.

GPU Memory Requirements

Precision	Minimum GPU Memory	Recommended GPU Memory
bf16	16 GB	33 GB
fp8	8 GB	16 GB

Deploying this NIM with less than the recommended amount of GPU memory requires setting the environment variable NIM_RELAX_MEM_CONSTRAINTS=1

Feature	Supported
LoRA Customization	✅
Fine-tuning Customization	✅
Tool Calling	✅
TensorRT-LLM Local Engine Building	✅