Try NVIDIA NIM APIs

Explore

Models

Skills

Blueprints

6 results for

Filters

Publisher

NVIDIA

Audience

AI Engineer

Ml Engineer

DevOps Engineer

Platform Engineer

Developer

Domain

Infrastructure

AI And Machine Learning

Library

TAO

Brev

DGX Cloud

Megatron Core

TAO Toolkit

Sort By

Brev managed GPU instances with Docker support. Use when running TAO training, evaluation, or inference on Brev GPU instances, managing Brev deployments, or dispatching TAO jobs through the Brev CLI. Trigger phrases include "run on Brev", "Brev GPU instan

Skill

Developer

407

Remote SLURM GPU cluster execution over SSH with sbatch/srun, Pyxis/Enroot containers, and Lustre-backed results. Use when running TAO training/eval/inference jobs on an on-prem or DGX SLURM cluster. Trigger phrases include "run on SLURM", "submit sbatch"

Skill

TAO

404

How to launch distributed Megatron-LM training jobs on a SLURM cluster. Covers a minimal sbatch skeleton, environment-variable setup for torch.distributed.run, CUDA_DEVICE_MAX_CONNECTIONS rules across hardware and parallelism modes, container conventions,

Skill

Developer

627

21d

Kubernetes execution platform — submits TAO container jobs as single-pod k8s Jobs with NVIDIA GPU scheduling. Use when running on EKS / GKE / AKS / on-prem clusters with the NVIDIA GPU Operator installed, or when integrating TAO into an existing k8s-nativ

Skill

Developer

407

Items per page

of 1 pages

DGX Cloud Lepton managed GPU compute platform with run/status/cancel interface. Use when submitting TAO jobs to DGX Cloud, dispatching training/eval/inference to Lepton GPU resources, or managing Lepton workspace deployments. Trigger phrases include "run

Skill

AI Engineer

401

Local or remote Docker execution for TAO SDK job containers using a Docker daemon with NVIDIA GPU runtime. Use when running TAO jobs on the current machine, a directly attached Docker host, or a remote GPU box exposed through DOCKER_HOST. Trigger phrases

Skill

Developer

411