
nvidia
cosmos-reason2-8b
Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
Input
Drop files here.mp4, .jpg, .jpeg, .png
.mp4.jpg.jpeg.png
Your question or task. Aim for up to 400 tokens (300 words); max 1000 tokens. Model can accommodate reasoning or non-reasoning answers. Enable reasoning by including this text string in the user prompt: Answer the question using the following format:<think>Your reasoning.</think> Write your final answer immediately after the </think> tag
Defines AI role/rules for session. Max 250 tokens.
Using free API for development