---
title: "stockmark-2-100b-instruct"
publisher: "stockmark"
type: "endpoint"
updated: "2025-09-24T23:32:29.350Z"
description: "Japanese-specialized large-language-model for enterprises to read and understand complex business documents."
canonical: "https://build.nvidia.com/stockmark/stockmark-2-100b-instruct"
---

# Stockmark-2-100B-Instruct

## Description
Stockmark-2-100B-Instruct is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions. This version improves instruction-following ability and adds support for long-context (32k), compared to the previous version.

*This model is ready for commercial and non-commercial use*

## Third-Party Community Consideration:
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA \
[Stockmark-2-100B-Instruct Model Card](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct).

## License and Terms of Use:
**GOVERNING TERMS:** The trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf). Use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). Additional Information: [MIT License](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md).

## Deployment Geography:
**Deployment Geography:** Global

## Use Case:
**Use Case:** Japanese and English language processing, instruction following, long-context understanding, research and commercial applications.

## Release Date:
**build.nvidia.com 09/24/2025 via [link](https://build.nvidia.com/stockmark/stockmark-2-100b-instruct)**  <br>
**Huggingface 09/24/2025 via [link](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct)**

## Reference(s):
[Stockmark Inc.](https://stockmark.co.jp/) <br> 
[GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html)

## Model Architecture:
**Architecture Type:** Causal Language Model  <br>
**Network Architecture:** Transformer-based with Grouped Query Attention (GQA)  <br>
**Total Parameters:** 96B  <br>
**Active Parameters:** 96B  <br>
**Vocabulary Size:** 100352  

### Input:
**Input Types:** Text   <br>
**Input Parameters:** [One-Dimensional (1D)] <br>
**Other Input Properties:** Supports Japanese and English languages   <br>
**Input Context Length (ISL):** 32,000 tokens

### Output:
**Output Type:** Text  <br>
**Output Parameters:** [One-Dimensional (1D)] <br>
**Other Output Properties:** Instruction-following responses in Japanese and English   <br>
**Output Context Length (OSL):** Up to 32,768 tokens (shared with input)

__Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.__

## Software Integration:
**Runtime Engines:** PyTorch, transformers, vLLM  
**Supported Hardware:**
- NVIDIA Ada Lovelace
- NVIDIA Ampere
- NVIDIA Blackwell
- NVIDIA Hopper
**Operating Systems:** Linux, Windows, macOS

## Model Version(s)
Stockmark-2-100B-Instruct

## Training, Testing, and Evaluation Datasets:

### Training Dataset
**Training Data Collection:** Synthetic <br>
**Training Labeling:** Synthetic <br>
**Data Modality:** Text <br>
**Text Training Data Size:** 1 Billion to 10 Trillion Tokens <br>
**Training Properties:** Post-training with SFT and DPO methods

### Testing Dataset
**Testing Data Collection:** Undisclosed <br>
**Testing Labeling:** Undisclosed <br>
**Testing Properties:** Undisclosed <br>

### Evaluation Dataset
**Evaluation Benchmark Score:** Japanese MT-bench Average: 7.87  <br>
**Evaluation Data Collection:** Japanese MT-bench evaluation`<br>
**Evaluation Labeling:** Automated scoring system  <br>
**Evaluation Properties:** Multi-domain evaluation including coding, extraction, humanities, math, reasoning, roleplay, and STEM

## Inference
**Acceleration Engine:** vLLM, TensorRT-LLM <br>
**Test Hardware:** 4*H100

## Additional Information
This project was supported by GENIAC. The model uses Grouped Query Attention (GQA) with 72 query heads and 8 key-value heads. Training libraries include NVIDIA/Megatron-LM for pretraining and huggingface/trl for posttraining. <br>

## Limitations
This model should be used responsibly and in accordance with applicable laws and regulations. Users should be aware of potential biases in the training data and outputs, particularly when processing content in Japanese and English languages.

## Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](nvidia.com/en-us/support/submit-security-vulnerability/).

## Prototype

```python
from openai import OpenAI

client = OpenAI(
base_url = "https://integrate.api.nvidia.com/v1",
api_key = "$NVIDIA_API_KEY"
)

completion = client.chat.completions.create(
model="",
messages=[{"role":"user","content":""}],
temperature=,
top_p=,
max_tokens=,
stream=NaN
)

print(completion.choices[0].message)
```

```javascript
import OpenAI from 'openai';

const openai = new OpenAI({
apiKey: '$NVIDIA_API_KEY',
baseURL: 'https://integrate.api.nvidia.com/v1',
})

async function main() {
const completion = await openai.chat.completions.create({
model: "",
messages: [{"role":"user","content":""}],
temperature: ,
top_p: ,
max_tokens: ,
stream: ,
})

process.stdout.write(completion.choices[0]?.message?.content);

}

main();
```

```bash
curl https://integrate.api.nvidia.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $NVIDIA_API_KEY" \
-d '{
"model": "stockmark/stockmark-2-100b-instruct",
"messages": [{"role":"user","content":""}],
"temperature": ,   
"top_p": ,
"max_tokens": ,
"stream":                 
}'
```