Try NVIDIA NIM APIs

Skip to main content

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

7 results for

Filters (1)

Publisher

NVIDIA

7

Labels (1)

Inference

Sort By

60 MIN

cuTile Kernels

Run cuTile kernel benchmarks, FMHA implementation, and LLM inference on DGX Spark and B300

1mo

Items per page

of 1 pages

30 MIN

LLM Inference with SGLang

Serve LLMs with SGLang on DGX Station (Qwen3-8B default; Qwen3.6 MoE optional)—prefix-cached multi-turn, structured output, benchmarks, and inference-server guidance

20d

30 MIN

LM Studio on DGX Spark

Deploy LM Studio and serve LLMs on a Spark device; use LM Link to access models remotely.

4mo

30 MIN

Nemotron-3-Nano with llama.cpp

Run Nemotron-3-Nano-30B model using llama.cpp on DGX Spark

6mo

30 MIN

Run models with llama.cpp on DGX Spark

Build llama.cpp with CUDA and serve models via an OpenAI-compatible API

2mo

30 MIN

vLLM for Inference

Install and use vLLM on DGX Station

3mo

RTX Workstation

30 MIN

vLLM for Inference

Install and use vLLM on NVIDIA RTX Pro 6000

5d