---
title: "jamba-1.5-large-instruct"
publisher: "ai21labs"
type: "endpoint"
updated: "2025-05-21T21:48:36.444Z"
description: "Cutting-edge MOE based LLM designed to excel in a wide array of generative AI tasks."
canonical: "https://build.nvidia.com/ai21labs/jamba-1_5-large-instruct"
---

# Model Overview

## Description:

**Jamba 1.5 Large** is a state-of-the-art, hybrid SSM-Transformer instruction following foundation model. It's a Mixture-of-Expert model with 94B total parameters and 398B active parameters.

The Jamba family of models are the most powerful & efficient long-context models on the market, and the only ones with an effective context window of 256K. For long context input, they deliver up to 2.5X faster inference than leading models of comparable sizes.

Jamba supports function calling/tool use, structured output (JSON), and grounded generation with citation mode and documents API.

Jamba officially supports English, French, Spanish, Portuguese, German, Arabic and Hebrew, but can also work in many other languages.

## Third-Party Community Consideration:
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case. Jamba 1.5 mini is developed by AI21 Labs and is available under the Jamba Open Model License for research and non-commercial use. For commercial use requiring self-deployment, a Jamba Commercial License must be acquired by contacting AI21 Labs.

## Terms of Use
GOVERNING TERMS: This trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf); and the use of this model is governed by the [Jamba Open License Agreement](https://assets.ngc.nvidia.com/products/api-catalog/legal/Jamba_Open_Model_License_Agreement.pdf).

## References(s):
Jamba 1.5 [blogpost](https://www.ai21.com/blog/announcing-jamba-model-family) <br>

## Model Architecture:

**Architecture Type:** Jamba (Joint Attention Mamba) <br>
**Network Architecture:** Jamba <br>
**Model Version:** 1.5 <br>

## Input:
**Input Type:** Text <br>
**Input Format:** String <br>
**Input Parameters:** One Dimensional (1D), Max Tokens, Temperature, Top P <br>
**Max Input Tokens:** 256,000 <br>

## Output:
**Output Type:** Text <br>
**Output Format:** String <br>
**Output Parameters:** 1D <br>
**Max Output Tokens:** 256,000 <br>

## Software Integration:

**Supported Hardware Platform(s):** NVIDIA Ampere, NVIDIA Hopper <br>
**Supported Operating System(s):** Linux <br>

## Benchmarks:

| Category               | Metric       | Score |
|------------------------|--------------|-------|
| General                | Arena Hard   | 65.4  |
| General                | MMLU (CoT)   | 81.2  |
| General                | MMLU Pro (CoT)| 53.5  |
| General                | IFEval       | 81.5  |
| General                | BBH          | 65.5  |
| General                | WildBench    | 48.4  |
| Reasoning              | ARC-C        | 93    |
| Reasoning              | GPQA         | 36.9  |
| Math, Code & Tool use  | GSM8K        | 87    |
| Math, Code & Tool use  | HumanEval    | 71.3  |
| Math, Code & Tool use  | BFCL         | 85.5  |

## Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications.  When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

## Prototype

```python
from openai import OpenAI

client = OpenAI(
base_url = "https://integrate.api.nvidia.com/v1",
api_key = "$NVIDIA_API_KEY"
)

completion = client.chat.completions.create(
model="",
messages=[{"role":"user","content":""}],
temperature=,
top_p=,
max_tokens=,
stream=NaN
)

print(completion.choices[0].message)
```

```python
from langchain_nvidia_ai_endpoints import ChatNVIDIA

client = ChatNVIDIA(
model="",
api_key="$NVIDIA_API_KEY", 
temperature=,
top_p=,
max_tokens=,
)

response = client.invoke([{"role":"user","content":""}])
print(response.content)
```

```javascript
import OpenAI from 'openai';

const openai = new OpenAI({
apiKey: '$NVIDIA_API_KEY',
baseURL: 'https://integrate.api.nvidia.com/v1',
})

async function main() {
const completion = await openai.chat.completions.create({
model: "",
messages: [{"role":"user","content":""}],
temperature: ,
top_p: ,
max_tokens: ,
stream: ,
})

process.stdout.write(completion.choices[0]?.message?.content);

}

main();
```

```bash
curl https://integrate.api.nvidia.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $NVIDIA_API_KEY" \
-d '{
"model": "ai21labs/jamba-1.5-large-instruct",
"messages": [{"role":"user","content":""}],
"temperature": ,   
"top_p": ,
"max_tokens": ,
"stream":                 
}'
```