Serve LLMs with SGLang on DGX Station (Qwen3-8B default; Qwen3.6 MoE optional)—prefix-cached multi-turn, structured output, benchmarks, and inference-server guidance
Build advanced AI agents for providers and patients using this developer example powered by NeMo Microservices, NVIDIA Nemotron, Riva ASR and TTS, and NVIDIA LLM NIM