
Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
NVIDIA Cosmos3-Nano is an open and customizable model for physical AI and robotics that enables robots and vision AI agents to reason like humans, using prior knowledge, physics understanding and common sense to understand and act in the real world. This model understands space, time, and fundamental physics, and can serve as a planning model to reason what steps an embodied agent might take next. It is part of Cosmos3-Nano, an Omni model capable of reasoning and generation capabilities.
The model is ready for commercial use.
Model Developer: NVIDIA
Cosmos3-Nano is a part of the Cosmos3 framework
Cosmos3 includes the following models:
Input Type(s): Text, Text+Image, Text+Video
Input Format(s):
Input Parameters:
Other Properties Related to Input:
Input Size and Length Limits:
Output Type(s): Text
Output Format: String
Output Parameters: Text: One-dimensional (1D)
Other Properties Related to Output:
Runtime Engine(s):
Supported Hardware Microarchitecture Compatibility:
Operating System(s):
Note: Only BF16 precision is tested.
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
Cosmos3-Nano-Reasoner may produce incorrect reasoning in challenging scenarios. Object states, causal relationships, spatial geometry, temporal ordering, agent intent, and future outcomes can be misinferred. Complex or long-context inputs may yield hallucinated entities, inconsistent interpretations, or implausible predictions.
See Cosmos for details: https://github.com/nvidia/cosmos
Please see the Cosmos3 technical paper for detailed evaluations of the base model: https://research.nvidia.com/labs/cosmos-lab/cosmos3/technical-report.pdf
GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of the model is governed by OpenMDW1.1 license
Models are commercially usable.
You are free to create and distribute Derivative Models. NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models.
Global
Build.NVIDIA.com 5/31/2026 via link
Huggingface 5/31/2026 via link
Downloadable NIM - Cosmos3-Reasoner 5/31/2026 via link
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment.
For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy subcards.
Please report security vulnerabilities or NVIDIA AI Concerns here.