databricks/dbrx-instruct

PREVIEW

A general-purpose LLM with state-of-the-art performance in language understanding, coding, and RAG.

Model Overview

Description:

DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction. It uses a fine-grained mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input.

Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. DBRX has 16 experts and chooses 4, while Mixtral-8x7B and Grok-1 have 8 experts and choose 2. This provides 65x more possible combinations of experts and we found that this improves model quality. DBRX uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA). It uses a converted version of the GPT-4 tokenizer as defined in the tiktoken repository. We made these choices based on exhaustive evaluation and scaling experiments.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA DBRX Model Card.

License and Terms of use

GOVERNING TERMS: Your use of this API is governed by the NVIDIA API Trial Service Terms of Use; and the use of this model is governed by the NVIDIA AI Foundation Models Community License and Databricks Open Model License.

References(s):

Blog post

Model Architecture:

Architecture Type: Transformer
Network Architecture: Fine-grained Mixture of Experts (MoE)

Input:

Input Format: Text
Input Parameters: Temperature, Top P, Max Output Tokens

Output:

Output Format: Text

Software Integration:

  • Supported Hardware Platform(s): Hopper

[Preferred/Supported] Operating System(s):

  • Linux

Training, Testing, and Evaluation Datasets:

Training Dataset:

Properties (Quantity, Dataset Descriptions, Sensor(s)): Pre-trained on 12T tokens of text and code data.

Inference:

Engine: Triton, TRT-LLM
Test Hardware: H100