Try NVIDIA NIM APIs

Explore

Models

Skills

Blueprints

5 results for

Filters

Publisher

NVIDIA

Audience

AI Engineer

Developer

DevOps Engineer

Hpc Developer

Ml Engineer

Domain

AI And Machine Learning

Library

Dynamo

NeMo Megatron Bridge

Megatron Core

Sort By

Validate and use packed sequences and long-context training in Megatron-Bridge, distinguishing offline packed SFT for LLMs from in-batch packing for VLMs, and applying the right CP constraints.

Skill

Developer

209

11d

Select, validate, patch, and deploy existing NVIDIA Dynamo Kubernetes recipes. Use for model/backend/GPU/deployment-mode recipe bring-up; use router-starter for router-only mode work and troubleshoot for broken deployments.

Skill

Developer

219

11d

Start or patch Dynamo router modes and run router endpoint smoke checks. Use for round-robin, KV-aware, least-loaded, or device-aware routing setup; use recipe-runner for recipe deployment and troubleshoot for failure diagnosis.

Skill

Developer

221

11d

How to launch distributed Megatron-LM training jobs on a SLURM cluster. Covers a minimal sbatch skeleton, environment-variable setup for torch.distributed.run, CUDA_DEVICE_MAX_CONNECTIONS rules across hardware and parallelism modes, container conventions,

Skill

Developer

205

Items per page

of 1 pages

Convert single-node scripts to multi-node Slurm sbatch jobs and debug common multi-node failures. Covers srun-native vs uv run torch.distributed approaches, container setup, NCCL timeouts, OOM sizing for MoE models, and interactive allocation.

Skill

Developer

185

11d