
Vision language model that excels in understanding the physical world using structured reasoning on videos or images.
Your question or task. Aim for up to 400 tokens (300 words); max 1000 tokens. Model can accommodate reasoning or non-reasoning answers. Enable reasoning by including this text string in the user prompt: Answer the question using the following format:<think>Your reasoning.</think> Write your final answer immediately after the </think> tag
Defines AI role/rules for session. Max 250 tokens.