Embedding model for text retrieval tasks, excelling in dense, multi-vector, and sparse retrieval.
BGE-M3 is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.
Some suggestions for retrieval pipeline in RAG
Authors recommend to use the following pipeline: hybrid retrieval + re-ranking.
Hybrid retrieval leverages the strengths of various methods, offering higher accuracy and stronger generalization capabilities. A classic example: using both embedding retrieval and the BM25 algorithm. Now, you can try to use BGE-M3, which supports both embedding and sparse retrieval. This allows you to obtain token weights (similar to the BM25) without any additional cost when generate dense embeddings. To use hybrid retrieval, please refer to Vespa and Milvus.
As cross-encoder models, re-ranker demonstrates higher accuracy than bi-encoder embedding model. Utilizing the re-ranking model (e.g., bge-reranker, bge-reranker-v2) after retrieval can further filter the selected text.
Model Name | Dimension | Sequence Length | Introduction |
---|---|---|---|
BAAI/bge-m3 | 1024 | 8192 | multilingual; unified fine-tuning (dense, sparse, and colbert) from bge-m3-unsupervised |
BAAI/bge-m3-unsupervised | 1024 | 8192 | multilingual; contrastive learning from bge-m3-retromae |
BAAI/bge-m3-retromae | -- | 8192 | multilingual; extend the max_length of xlm-roberta to 8192 and further pretrained via retromae |
BAAI/bge-large-en-v1.5 | 1024 | 512 | English model |
BAAI/bge-base-en-v1.5 | 768 | 512 | English model |
BAAI/bge-small-en-v1.5 | 384 | 512 | English model |
bge-m3 is licensed under the MIT Licence.
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.
Architecture Type: Transformer
Network Architecture: Fine-tuned XLMRobertaModel
Embedding Dimensiion: 1024
Parameter Count: 568 million
Input Type: Text
Input Format: List of strings
Output Type: Floating Points
Output Format: list of float arrays
Other Properties Related to Output: Each array contains the embeddings for the corresponding input string.
BAAI/bge-m3
Dataset | Introduction |
---|---|
MLDR | Document Retrieval Dataset, covering 13 languages |
bge-m3-data | Fine-tuning data used by bge-m3 |