Skip to main content
NVIDIA
Explore
Models
Skills
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

7 results for

Filters (1)

  • NVIDIA
    7
  • Inference
  • DGX Spark
    60 MIN

    cuTile Kernels

    Run cuTile kernel benchmarks, FMHA implementation, and LLM inference on DGX Spark and B300
    Playbook
    FMHA
    1mo
    Items per page
    of 1 pages
    DGX Station
    30 MIN

    LLM Inference with SGLang

    Serve LLMs with SGLang on DGX Station (Qwen3-8B default; Qwen3.6 MoE optional)—prefix-cached multi-turn, structured output, benchmarks, and inference-server guidance
    Playbook
    RadixAttention
    20d
    DGX Spark
    30 MIN

    LM Studio on DGX Spark

    Deploy LM Studio and serve LLMs on a Spark device; use LM Link to access models remotely.
    Playbook
    Inference
    4mo
    DGX Spark
    30 MIN

    Nemotron-3-Nano with llama.cpp

    Run Nemotron-3-Nano-30B model using llama.cpp on DGX Spark
    Playbook
    Nemotron
    6mo
    DGX Spark
    30 MIN

    Run models with llama.cpp on DGX Spark

    Build llama.cpp with CUDA and serve models via an OpenAI-compatible API
    Playbook
    DGX Spark
    2mo
    DGX Station
    30 MIN

    vLLM for Inference

    Install and use vLLM on DGX Station
    Playbook
    vLLM
    3mo
    RTX Workstation
    30 MIN

    vLLM for Inference

    Install and use vLLM on NVIDIA RTX Pro 6000
    Playbook
    vLLM
    5d