---
title: "ministral-14b-instruct-2512"
publisher: "mistralai"
type: "endpoint"
updated: "2025-12-02T15:26:23.480Z"
description: "A general purpose VLM ideal for chat and instruction based use cases"
canonical: "https://build.nvidia.com/mistralai/ministral-14b-instruct-2512"
---

# Ministral 3 14B Instruct 2512

## Description
Ministral 3 14B Instruct 2512 FP8 is the largest model in the Ministral 3 family, offering frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities, this instruct post-trained version in **FP8 precision** is fine-tuned for instruction tasks, making it ideal for chat and instruction-based use cases.

The FP8 quantization enables deployment with reduced memory requirements while maintaining model quality, capable of fitting in 24GB of VRAM, and less if further quantized.

*This model is ready for commercial/non-commercial use.*

## Third-Party Community Consideration:
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA [Ministral 3 14B Instruct 2512 FP8 Model Card](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512-FP8)

## License and Terms of Use:
**GOVERNING TERMS:** This trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf). Use of this model is governed by the [NVIDIA Community Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-community-models-license/). Additional Information: [Apache License Version 2.0](https://www.apache.org/licenses/LICENSE-2.0.txt).

## Deployment Geography:
Global

## Use Case:
**Use Case:** Designed for private AI deployments where advanced capabilities meet practical hardware constraints. Ideal for private/custom chat and AI assistant deployments in constrained environments, advanced local agentic use cases, fine-tuning and specialization, and bringing advanced AI capabilities to edge environments. The FP8 quantization enables efficient deployment on resource-constrained hardware.

## Release Date:
**Build.NVIDIA.com:** 12/2025 via [link](https://build.nvidia.com/mistralai/ministral-14b-instruct-2512) 
**Huggingface:** 12/2025 via [link](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512-FP8)

## Reference(s):
**References:** 
- [Ministral 3 Family on Hugging Face](https://huggingface.co/collections/mistralai/ministral-3)
- [Ministral 3 14B Instruct BF16 Version](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512)
- [Ministral 3 - More Formats Collection](https://huggingface.co/collections/mistralai/ministral-3-more)
- [vLLM Framework](https://github.com/vllm-project/vllm)
- [Mistral Common Library](https://github.com/mistralai/mistral-common)
- [System Prompt Configuration](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506/blob/main/SYSTEM_PROMPT.txt)

<details>
<summary><b>Ministral 3 Family</b></summary>

| Model Name | Type | Precision | Link |
|------------|------|-----------|------|
| Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) |
| Ministral 3 3B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) |
| Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) |
| Ministral 3 8B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) |
| Ministral 3 8B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) |
| Ministral 3 8B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512) |
| Ministral 3 14B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512) |
| Ministral 3 14B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) |
| **Ministral 3 14B Instruct 2512 FP8** | **Instruct post-trained** | **FP8** | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512-FP8) |
| Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) |

</details>

## Model Architecture:
**Architecture Type:** Transformer  
**Network Architecture:** Ministral (13.5B Language Model + 0.4B Vision Encoder)  
**Total Parameters:** 14B  
**Active Parameters:** 14B  
**Vocabulary Size:** Undisclosed  
**Base Model:** mistralai/Ministral-3-14B-Base-2512

### Input:
**Input Types:** Image, Text  
**Input Formats:** Red, Green, Blue (RGB), String  
**Input Parameters:** Two Dimensional (2D), One Dimensional (1D)  
**Other Input Properties:** Supports multimodal input with up to 10 images per prompt. Images are processed through a 0.4B vision encoder. Text inputs support multilingual content (English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic). Recommended system prompt configuration available in repository.  
**Input Context Length (ISL):** 262,144

### Output:
**Output Types:** Text  
**Output Format:** String  
**Output Parameters:** One Dimensional (1D)  
**Other Output Properties:** Supports native function calling and JSON output formatting. Best results achieved with temperature=0.15. Strong system prompt adherence for tailored responses.  
**Output Context Length (OSL):** Undisclosed

__Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.__

## Software Integration:
**Runtime Engines:**
- **vLLM:** 0.12.0 or higher (recommended)
- **Transformers:** Latest version with mistral-common >= 1.8.6

**Supported Hardware:**
- **NVIDIA Ampere:** A100 (80GB), A40, A30, A10, GeForce RTX 3090
- **NVIDIA Blackwell:** B200, B100, GB200
- **NVIDIA Hopper:** H100, H200
- **NVIDIA Lovelace:** L40, L40S, L4, GeForce RTX 4090, RTX 6000 Ada Generation

**Operating Systems:** Linux

**Additional Testing Statement:**
__The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.__

## Model Version(s)
v1.0 (December 2025)

## Training, Testing, and Evaluation Datasets:

### Training Dataset
**Data Modality:** Undisclosed  
**Training Data Collection:** Undisclosed  
**Training Labeling:** Undisclosed  
**Training Properties:** Undisclosed

### Testing Dataset
**Testing Data Collection:** Undisclosed  
**Testing Labeling:** Undisclosed  
**Testing Properties:** Undisclosed

### Evaluation Dataset
**Evaluation Benchmark Score:** Benchmark results are provided below comparing Ministral 3 14B to similar sized models across reasoning, instruct, and base model evaluations.

<details>
<summary><b>Reasoning Benchmarks</b></summary>

| Model | AIME25 | AIME24 | GPQA Diamond | LiveCodeBench |
|-------|--------|--------|--------------|---------------|
| **Ministral 3 14B** | **0.850** | **0.898** | **0.712** | **0.646** |
| Qwen3-14B (Thinking) | 0.737 | 0.837 | 0.663 | 0.593 |
|  |  |  |  |  |
| **Ministral 3 8B** | 0.787 | **0.860** | 0.668 | **0.616** |
| Qwen3-VL-8B-Thinking | **0.798** | **0.860** | **0.671** | 0.580 |
|  |  |  |  |  |
| **Ministral 3 3B** | **0.721** | **0.775** | 0.534 | **0.548** |
| Qwen3-VL-4B-Thinking | 0.697 | 0.729 | **0.601** | 0.513

</details>

<details>
<summary><b>Instruct Benchmarks</b></summary>

| Model | Arena Hard | WildBench | MATH Maj@1 | MM MTBench |
|-------|------------|-----------|------------|------------|
| **Ministral 3 14B** | **0.551** | **68.5** | **0.904** | **8.49** |
| Qwen3 14B (Non-Thinking) | 0.427 | 65.1 | 0.870 | NOT MULTIMODAL |
| Gemma3-12B-Instruct | 0.436 | 63.2 | 0.854 | 6.70 |
|  |  |  |  |  |
| **Ministral 3 8B** | 0.509 | **66.8** | 0.876 | **8.08** |
| Qwen3-VL-8B-Instruct | **0.528** | 66.3 | **0.946** | 8.00 |
|  |  |  |  |  |
| **Ministral 3 3B** | 0.305 | **56.8** | 0.830 | 7.83 |
| Qwen3-VL-4B-Instruct | **0.438** | **56.8** | **0.900** | **8.01** |
| Qwen3-VL-2B-Instruct | 0.163 | 42.2 | 0.786 | 6.36 |
| Gemma3-4B-Instruct | 0.318 | 49.1 | 0.759 | 5.23

</details>

<details>
<summary><b>Base Model Benchmarks</b></summary>

| Model | Multilingual MMLU | MATH CoT 2-Shot | AGIEval 5-shot | MMLU Redux 5-shot | MMLU 5-shot | TriviaQA 5-shot |
|-------|-------------------|-----------------|----------------|-------------------|-------------|-----------------|
| **Ministral 3 14B** | 0.742 | **0.676** | 0.648 | 0.820 | 0.794 | 0.749 |
| Qwen3 14B Base | **0.754** | 0.620 | **0.661** | **0.837** | **0.804** | 0.703 |
| Gemma 3 12B Base | 0.690 | 0.487 | 0.587 | 0.766 | 0.745 | **0.788** |
|  |  |  |  |  |  |  |
| **Ministral 3 8B** | **0.706** | **0.626** | 0.591 | 0.793 | **0.761** | **0.681** |
| Qwen 3 8B Base | 0.700 | 0.576 | **0.596** | **0.794** | 0.760 | 0.639 |
|  |  |  |  |  |  |  |
| **Ministral 3 3B** | 0.652 | **0.601** | 0.511 | 0.735 | 0.707 | 0.592 |
| Qwen 3 4B Base | **0.677** | 0.405 | **0.570** | **0.759** | **0.713** | 0.530 |
| Gemma 3 4B Base | 0.516 | 0.294 | 0.430 | 0.626 | 0.589 | **0.640** |

</details>

**Evaluation Data Collection:** Automated  
**Evaluation Labeling:** Automated  
**Evaluation Properties:** Standard industry benchmarks for reasoning (AIME25, AIME24, GPQA Diamond, LiveCodeBench), instruct (Arena Hard, WildBench, MATH Maj@1, MM MTBench), and base model performance (Multilingual MMLU, MATH CoT 2-Shot, AGIEval, MMLU Redux, MMLU, TriviaQA). Complete benchmark results available in the source model card linked above.

## Inference
**Acceleration Engine:** Other (vLLM with mistral-common tokenizer)  
**Test Hardware:** Due to its size and the FP8 format of its weights, Ministral 3 14B Instruct 2512 can run on a single 1xH200 GPU. The model requires approximately 24GB of VRAM in FP8 precision, and less if further quantized. Capable of running on edge devices and local deployments.

## Additional Details

### Recommended Deployment Settings

- **Temperature:** 0.15 (recommended for best results)
- **Max Tokens:** Up to 262,144 (256K context window)
- **System Prompt:** Use provided [SYSTEM_PROMPT.txt](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506/blob/main/SYSTEM_PROMPT.txt) for general assistant use
- **Image Limit:** Up to 10 images per prompt

### Usage

The model can be used with the following frameworks:
- [`vllm (recommended)`](https://github.com/vllm-project/vllm): See [vLLM section](#vllm-recommended) below
- [`transformers`](https://github.com/huggingface/transformers): See [Transformers section](#transformers) below

**Note 1**: We recommend using a relatively low temperature, such as `temperature=0.15`.

**Note 2**: Make sure to add a system prompt to the model to best tailor it to your needs. If you want to use the model as a general assistant, we recommend to use the one provided in the [SYSTEM_PROMPT.txt](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506/blob/main/SYSTEM_PROMPT.txt) file.

#### vLLM (recommended)

We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).

**Installation**

Make sure to install vLLM >= 0.12.0:

```bash
pip install vllm --upgrade
```

Doing so should automatically install `mistral_common >= 1.8.6`.

To check:
```bash
python -c "import mistral_common; print(mistral_common.__version__)"
```

You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest). If you are on Jetson Thor, you can use this [vLLM container](https://github.com/orgs/nvidia-ai-iot/packages/container/vllm/596293600?tag=latest-jetson-thor) and if you are on Jetson Orin, you can use this [vLLM container](https://github.com/orgs/nvidia-ai-iot/packages/container/vllm/596429407?tag=latest-jetson-orin). 

**Serve**

We recommend that you use Ministral 3 14B in a server/client setting.

1. Spin up a server:

```bash
vllm serve mistralai/Ministral-3-14B-Instruct-2512 \
--enable-auto-tool-choice --tool-call-parser mistral
```

**Note:** Due to its size and the FP8 format of its weights, Ministral 3 14B Instruct 2512 can run on a single 1xH200 GPU.

**Additional flags:**

* You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
* You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.

2. To ping the client you can use a simple Python snippet. See the following examples.

#### Vision reasoning

Leverage the vision capabilities of Ministral 3 14B Instruct 2512 to make the best choice given a scenario.

<details>
<summary>Python snippet</summary>

```python
from datetime import datetime, timedelta

from openai import OpenAI
from huggingface_hub import hf_hub_download

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 262144

client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

def load_system_prompt(repo_id: str, filename: str) -> str:
file_path = hf_hub_download(repo_id=repo_id, filename=filename)
with open(file_path, "r") as file:
system_prompt = file.read()
today = datetime.today().strftime("%Y-%m-%d")
yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
model_name = repo_id.split("/")[-1]
return system_prompt.format(name=model_name, today=today, yesterday=yesterday)

model_id = "mistralai/Ministral-3-14B-Instruct-2512"
SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt")
image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"

messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
},
{"type": "image_url", "image_url": {"url": image_url}},
],
},
]

response = client.chat.completions.create(
model=model,
messages=messages,
temperature=TEMP,
max_tokens=MAX_TOK,
)

print(response.choices[0].message.content)
```

</details>

#### Function calling

Ministral 3 14B Instruct 2512 is excellent at function / tool calling tasks via vLLM.

<details>
<summary>Python snippet</summary>

```python
import json
from openai import OpenAI
from huggingface_hub import hf_hub_download

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 262144

client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

def load_system_prompt(repo_id: str, filename: str) -> str:
file_path = hf_hub_download(repo_id=repo_id, filename=filename)
with open(file_path, "r") as file:
system_prompt = file.read()
return system_prompt

model_id = "mistralai/Ministral-3-14B-Instruct-2512"
SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt")

image_url = "https://math-coaching.com/img/fiche/46/expressions-mathematiques.jpg"

def my_calculator(expression: str) -> str:
return str(eval(expression))

tools = [
{
"type": "function",
"function": {
"name": "my_calculator",
"description": "A calculator that can evaluate a mathematical expression.",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "The mathematical expression to evaluate.",
},
},
"required": ["expression"],
},
},
},
{
"type": "function",
"function": {
"name": "rewrite",
"description": "Rewrite a given text for improved clarity",
"parameters": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "The input text to rewrite",
}
},
},
},
},
]

messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Thanks to your calculator, compute the results for the equations that involve numbers displayed in the image.",
},
{
"type": "image_url",
"image_url": {
"url": image_url,
},
},
],
},
]

response = client.chat.completions.create(
model=model,
messages=messages,
temperature=TEMP,
max_tokens=MAX_TOK,
tools=tools,
tool_choice="auto",
)

tool_calls = response.choices[0].message.tool_calls

results = []
for tool_call in tool_calls:
function_name = tool_call.function.name
function_args = tool_call.function.arguments
if function_name == "my_calculator":
result = my_calculator(**json.loads(function_args))
results.append(result)

messages.append({"role": "assistant", "tool_calls": tool_calls})
for tool_call, result in zip(tool_calls, results):
messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"name": tool_call.function.name,
"content": result,
}
)

response = client.chat.completions.create(
model=model,
messages=messages,
temperature=TEMP,
max_tokens=MAX_TOK,
)

print(response.choices[0].message.content)
```

</details>

#### Instruction following

Ministral 3 14B Instruct 2512 will follow your instructions precisely.

<details>
<summary>Python snippet</summary>

```python
from openai import OpenAI
from huggingface_hub import hf_hub_download

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 262144

client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

def load_system_prompt(repo_id: str, filename: str) -> str:
file_path = hf_hub_download(repo_id=repo_id, filename=filename)
with open(file_path, "r") as file:
system_prompt = file.read()
return system_prompt

model_id = "mistralai/Ministral-3-14B-Instruct-2512"
SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt")

messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": "Write me a sentence where every word starts with the next letter in the alphabet - start with 'a' and end with 'z'.",
},
]

response = client.chat.completions.create(
model=model,
messages=messages,
temperature=TEMP,
max_tokens=MAX_TOK,
)

assistant_message = response.choices[0].message.content
print(assistant_message)
```

</details>

#### Transformers

You can also use Ministral 3 14B Instruct 2512 FP8 with `Transformers`.

Transformers very recently added preliminary support for FP8, so please make sure to install from main:

```sh
uv pip install git+https://github.com/huggingface/transformers
```

To make the best use of our model with `Transformers` make sure to have [installed](https://github.com/mistralai/mistral-common) `mistral-common >= 1.8.6` to use our tokenizer.

```bash
pip install mistral-common --upgrade
```

Try it out by running the following snippet.

> [!Tip]
> By default Transformers will load the checkpoint in FP8 and dequantize it to BF16 on the fly,
> which means the model currently does not make use of accelerated FP8-kernels.
> Compatibility with accelerated FP8-kernels is currently worked on and will be available in a couple of weeks.
> Stay tuned!

<details>
<summary>Python snippet</summary>

```python
import torch
from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend

model_id = "mistralai/Ministral-3-14B-Instruct-2512"

tokenizer = MistralCommonBackend.from_pretrained(model_id)
model = Mistral3ForConditionalGeneration.from_pretrained(model_id, device_map="auto")

image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"

messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
},
{"type": "image_url", "image_url": {"url": image_url}},
],
},
]

tokenized = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True)

tokenized["input_ids"] = tokenized["input_ids"].to(device="cuda")
tokenized["pixel_values"] = tokenized["pixel_values"].to(dtype=torch.bfloat16, device="cuda")
image_sizes = [tokenized["pixel_values"].shape[-2:]]

output = model.generate(
**tokenized,
image_sizes=image_sizes,
max_new_tokens=512,
)[0]

decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
print(decoded_output)
```

**Note:**

Transformers allows you to automatically convert the checkpoint to Bfloat16. To do so, simply load the model as follows:

```py
from transformers import Mistral3ForConditionalGeneration, FineGrainedFP8Config

model_id = "mistralai/Ministral-3-14B-Instruct-2512"
model = Mistral3ForConditionalGeneration.from_pretrained(
model_id,
device_map="auto",
quantization_config=FineGrainedFP8Config(dequantize=True)
)
```

</details>

## Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please make sure you have proper rights and permissions for all input image content; if image includes people, personal health information, or intellectual property, the image generated will not blur or maintain proportions of image subjects included.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

## Prototype

```python
import requests

invoke_url = "https://integrate.api.nvidia.com/v1/mistralai/ministral-14b-instruct-2512"

headers = {
"Authorization": "Bearer ",
"Accept": "application/json",
}

payload = {
"messages": [
{
"role": "user",
"content": ""
}
]
}

# re-use connections
session = requests.Session()

response = session.post(invoke_url, headers=headers, json=payload)

response.raise_for_status()
response_body = response.json()
print(response_body)
```

```javascript
import fetch from "node-fetch";

const invokeUrl = "https://integrate.api.nvidia.com/v1/mistralai/ministral-14b-instruct-2512"

const headers = {
"Authorization": "Bearer ",
"Accept": "application/json",
}

const payload = {
"messages": [
{
"role": "user",
"content": ""
}
]
}

let response = await fetch(invokeUrl, {
method: "post",
body: JSON.stringify(payload),
headers: { "Content-Type": "application/json", ...headers }
});

let response_body = await response.json()

console.log(JSON.stringify(response_body))
```

```bash
invoke_url='https://integrate.api.nvidia.com/v1/mistralai/ministral-14b-instruct-2512'

authorization_header='Authorization: Bearer '
accept_header='Accept: application/json'
content_type_header='Content-Type: application/json'

data=$'{
"messages": [
{
"role": "user",
"content": ""
}
]
}'

response=$(curl --silent -i -w "\n%{http_code}" --request POST \
--url "$invoke_url" \
--header "$authorization_header" \
--header "$accept_header" \
--header "$content_type_header" \
--data "$data"
)

echo "$response"
```