Cosmos3-Nano-Reasoner Overview

Description:

NVIDIA Cosmos3-Nano is an open and customizable model for physical AI and robotics that enables robots and vision AI agents to reason like humans, using prior knowledge, physics understanding and common sense to understand and act in the real world. This model understands space, time, and fundamental physics, and can serve as a planning model to reason what steps an embodied agent might take next. It is part of Cosmos3-Nano, an Omni model capable of reasoning and generation capabilities.

The model is ready for commercial use.

Model Developer: NVIDIA

Model Versions

Cosmos3-Nano is a part of the Cosmos3 framework

Cosmos3 includes the following models:

Cosmos3-Nano (Reasoning tower): Given a text prompt and an input video, think and generate the answer with respect to the input text prompt and video.
Cosmos3-Super (Reasoning tower): Given a text prompt and an input video, think and generate the answer with respect to the input text prompt and video.

Input:

Input Type(s): Text, Text+Image, Text+Video

Input Format(s):

Text: String
Image: jpg, png, jpeg, webp
Video: mp4

Input Parameters:

Text: One-dimensional (1D)
Image: Two-dimensional (2D)
Video: Three-dimensional (3D)

Other Properties Related to Input:

Video inputs are recommended at a frame rate of 4 fps.
Long-context inputs are supported up to 256K tokens.
Image inputs may be passed as files or URLs.
Video inputs should be mp4 and follow the 4 fps recommendation for Reasoner usage.

Input Size and Length Limits:

Text: up to 256K tokens in the context window.
Image: standard supported image formats passed as file or URL.
Video: mp4 at the recommended 4 fps.

Output:

Output Type(s): Text

Output Format: String

Output Parameters: Text: One-dimensional (1D)

Other Properties Related to Output:

Default max_tokens=4096+ is recommended for reasoning outputs; longer outputs may be requested.
Reasoning outputs may include structured reasoning, 2D/3D point localization, and bounding-box coordinates for vision-based tasks.
Outputs are not guaranteed to be correct and should not be treated as safety-certified decisions or ground truth.

Software Integration:

Runtime Engine(s):

vLLM for OpenAI-compatible Reasoner serving.
PyTorch and Cosmos framework workflows for local development and integration.

Supported Hardware Microarchitecture Compatibility:

NVIDIA Ampere
NVIDIA Blackwell
NVIDIA Hopper

Operating System(s):

Linux. Other operating systems have not been tested.

Note: Only BF16 precision is tested.

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Limitations:

Cosmos3-Nano-Reasoner may produce incorrect reasoning in challenging scenarios. Object states, causal relationships, spatial geometry, temporal ordering, agent intent, and future outcomes can be misinferred. Complex or long-context inputs may yield hallucinated entities, inconsistent interpretations, or implausible predictions.

Usage:

See Cosmos for details: https://github.com/nvidia/cosmos

Quality Benchmarks

Please see the Cosmos3 technical paper for detailed evaluations of the base model: https://research.nvidia.com/labs/cosmos-lab/cosmos3/technical-report.pdf

License and Terms of Use:

GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of the model is governed by OpenMDW1.1 license

Models are commercially usable.

You are free to create and distribute Derivative Models. NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models.

Deployment Geography:

Global

Release Date:

Build.NVIDIA.com 5/31/2026 via link

Huggingface 5/31/2026 via link

Downloadable NIM - Cosmos3-Reasoner 5/31/2026 via link

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.

Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment.

For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy subcards.

Please report security vulnerabilities or NVIDIA AI Concerns here.

NVIDIA

cosmos3-nano-reasoner