VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.
VISTA-3D is a specialized interactive foundation model for 3D medical imaging. It excels in providing accurate and adaptable segmentation analysis across anatomies and modalities. Utilizing a multi-head architecture, VISTA-3D adapts to varying conditions and anatomical areas, helping guide users' annotation workflow. This model is for research purposes and not for clinical usage.
The VISTA3D model was trained on a large and diverse dataset of 11454 3D CT volumes. This dataset was curated from in-house and publicly available sources. The training data encompassed a wide range of acquisition protocols.
The spatial resolutions of the scans varied significantly, ranging from 0.45 × 0.45 × 0.45 mm³ to 1.50 × 1.50 × 7.50 mm³, with a median resolution of 0.88 × 0.88 × 1.50 mm³. This indicates that the training data included scans with varying slice thicknesses and in-plane resolutions [our conversation history].
Information regarding the gender breakdown of the participants within these datasets is not explicitly provided in the paper or its supplementary material [our conversation history, 47]. While Table 1 in the supplementary material lists the datasets used and the number of cases, it does not include demographic information like gender [our conversation history]. Similarly, Figure 1 in the supplementary material shows the distribution of annotated voxels per class but does not include gender information [our conversation history].
Other relevant details include:
The VISTA3D model is intended to facilitate clinicians and researchers using 3D Computed Tomography (CT) images. As a highly accurate and clinically applicable segmentation foundation model, it aims to streamline workflows in medical image analysis. Specifically, CT image segmentation can aid in diagnosis, treatment planning, and disease monitoring by providing detailed morphological information of body structures and abnormalities. VISTA3D aims to reduce the time-consuming and tedious nature of manual segmentation in clinical practice.
VISTA3D possesses several essential capabilities for 3D CT image segmentation:
The VISTA3D model architecture includes two branches, an automatic branch for direct segmentation of supported classes and an interactive branch that accepts user clicks for both supported and novel zero-shot classes. These branches share the same image encoder.
By using this model, you are agreeing to the terms and conditions of the license.
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick, 2023. High-resolution 3D abdominal segmentation with random patch network fusion. Segment Anything. arXiv:2304.02643
Architecture Type: Transformer
Network Architecture: SegResNet + Prompt Encoding
Input Type(s): Computed Tomography (CT) Image
Input Format(s): (Neuroimaging Informatics Technology Initiative) NIfTI
Input Parameters: Three-Dimensional (3D)
Other Properties Related to Input: Array of Class/Point Information
Output Type(s): Image
Output Format: NIfTI
Output Parameters: 3D
Runtime Engine(s):
MONAI Core v.1.4
Supported Hardware Microarchitecture Compatibility:
[Preferred/Supported] Operating System(s):
Engine: Triton
Test Hardware: A100, H100, L40
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.
By using this model, you are agreeing to the terms and conditions of the license.
He, Y., Guo, P., Tang, Y., Myronenko, A., Nath, V., Xu, Z., ... & Li, W. (2024). Vista3d: Versatile imaging segmentation and annotation model for 3d computed tomography. CVPR2025.