NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

mit

diffdock

Run Anywhere

Predicts the 3D structure of how a molecule interacts with a protein.

BioNemoChemistryDockingnimDrug Discovery
Get API Key
API Reference
Accelerated by DGX Cloud

Model Overview

Description:

DiffDock is a generative diffusion model for drug discovery in molecular blind docking.

DiffDock consists of two models: the Score and Confidence models. The Score model generates a series of potential poses for protein-ligand binding by running a reverse diffusion process.

DiffDock does not require any information about a binding pocket. During its diffusion process, the molecule's position relative to the protein, its orientation, and the torsion angles are allowed to change. Running the learned reverse diffusion process transforms a distribution of noisy prior molecule poses to the one learned by the model. As a result, it outputs many sampled poses and ranks them via its confidence model.

Leveraging the same neural-network architecture designed in the original DiffDock by MIT, the model v2.2.0 is trained by NVIDIA using PLINDER and SAIR, a state-of-art dataset of well curated and labeled protein-ligand complexes, which therefore, delivers a much higher accuracy for molecular docking tasks.

This model is ready for commercial and non-commercial use.

License/Terms of Use:

Use of this model is governed by the NVIDIA Open Model License. Additional Information: MIT.

Deployment Geography:

Global

Use Case:

DiffDock is designed for computational chemists, bioinformaticians, and pharmaceutical researchers. Its primary use case is predicting the binding poses of small molecules (ligands) to target proteins, facilitating drug discovery by identifying and optimizing potential therapeutic compounds.

Release Date:

GitHub 10/03/2022 via github.com/gcorso/DiffDock
Hugging Face 12/01/2022 via huggingface.co/spaces/simonduerr/diffdock
build.nvidia.com 1/8/2026 via build.nvidia.com/mit/diffdock
NGC 1/8/2026 via catalog.ngc.nvidia.com

References:

@article {Durairaj2024.07.17.603955,
	author = {Durairaj, Janani and Adeshina, Yusuf and Cao, Zhonglin and Zhang, Xuejin and Oleinikovas, Vladas and Duignan, Thomas and McClure, Zachary and Robin, Xavier and Studer, Gabriel and Kovtun, Daniel and Rossi, Emanuele and Zhou, Guoqing and Veccham, Srimukh and Isert, Clemens and Peng, Yuxing and Sundareson, Prabindh and Akdel, Mehmet and Corso, Gabriele and St{\"a}rk, Hannes and Tauriello, Gerardo and Carpenter, Zachary and Bronstein, Michael and Kucukbenli, Emine and Schwede, Torsten and Naef, Luca},
	title = {PLINDER: The protein-ligand interactions dataset and evaluation resource},
	elocation-id = {2024.07.17.603955},
	year = {2024},
	doi = {10.1101/2024.07.17.603955},
	publisher = {Cold Spring Harbor Laboratory},
	abstract = {Protein-ligand interactions (PLI) are foundational to small molecule drug design. With computational methods striving towards experimental accuracy, there is a critical demand for a well-curated and diverse PLI dataset. Existing datasets are often limited in size and diversity, and commonly used evaluation sets suffer from training information leakage, hindering the realistic assessment of method generalization capabilities. To address these shortcomings, we present PLIN-DER, the largest and most annotated dataset to date, comprising 449,383 PLI systems, each with over 500 annotations, similarity metrics at protein, pocket, interaction and ligand levels, and paired unbound (apo) and predicted structures. We propose an approach to generate training and evaluation splits that minimizes task-specific leakage and maximizes test set quality, and compare the resulting performance of DiffDock when retrained with different kinds of splits.Competing Interest StatementThe authors have declared no competing interest.},
	URL = {https://www.biorxiv.org/content/early/2024/07/19/2024.07.17.603955.1},
	eprint = {https://www.biorxiv.org/content/early/2024/07/19/2024.07.17.603955.1.full.pdf},
	journal = {bioRxiv}
}
@article{corso2023diffdock,
      title={DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking}, 
      author = {Corso, Gabriele and Stärk, Hannes and Jing, Bowen and Barzilay, Regina and Jaakkola, Tommi},
      journal={International Conference on Learning Representations (ICLR)},
      year={2023}
}

Model Architecture:

Architecture Type: Score-Based Diffusion Model (SBDM)
Network Architecture: Graph Convolution Neural Network

The Score model is a 3-dimensional equivariant graph neural network that has three layers: embedding, interaction layer with 6 graph convolution layers, and output layer. In total, the Score model has 20M parameters.

Input:

Input Type(s): Text (Ligand, Protein), Number (Poses to Generate, Batch Size, Diffusion Steps, Diffusion Time Divisions) Binary (No Final Step Noise, Save Diffusion Trajectory, and Skip Gen Conformer)
Input Format(s): Text: String (SMILES, Structural Data Files (SDF) or Tripos molecule structure (Mol2) for Ligand), String (Protein Data Bank (PDB)), Number: Integer; Binary: Boolean
Input Parameters: Text: One-Dimensional (1D), Number: One-Dimensional (1D), Binary: One-Dimensional (1D)
Other Properties Related to Input: No max sequence

Output:

Output Type(s): Text (Ligand Molecule 3D Positions, 3D), Text (Ligand Molecule 3D Positions, 3D), Number (List of Confidence Scores, 1D)
Output Format: Text: Structural Data Files (SDF), Text: Protein Data Bank (PDB), Number: Array of Floating Point 32
Output Parameters: docked_ligand, visualizations_files, pose_confidence
Other Properties Related to Output: Output includes ranked binding poses with associated confidence scores. Higher confidence scores indicate more reliable predictions.

Software Integration:

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Runtime Engine(s):

  • PyTorch

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Ampere
  • NVIDIA Lovelace
  • NVIDIA Hopper
  • NVIDIA Grace Hopper

[Preferred/Supported] Operating System(s):

  • Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version(s):

DiffDock v2.2

Training & Evaluation Dataset:

Data Modality:

Other: 3D Molecular Structures (protein-ligand complexes)

Training Data Size:

486,000 protein-ligand complexes (450,000 from PLINDER + 36,000 selected from SAIR)

Training:

Link: PLINDER

Link: SAIR

Data Collection Method by dataset:

  • Human

Labeling Method by dataset:

  • Hybrid: Human & Automated

Properties (Quantity, Dataset Descriptions, Sensor(s)): 486,000 protein-ligand complexes (450,000 from PLINDER automatically curated using the PDB database + 36,000 selected from SAIR). For more information, see Technical Paper.

Evaluation:

Link: PoseBusters benchmark (PDB) set

Data Collection Method by dataset:

  • Human

Labeling Method by dataset:

  • Hybrid: Human & Automated

Properties (Quantity, Dataset Descriptions, Sensor(s)): 486,000 protein-ligand complexes (450,000 from PLINDER automatically curated using the PDB database + 36,000 selected from SAIR). For more information, see Technical Paper.

Inference:

Engine: PyTorch
Test Hardware:

  • A10G, A100, RTX6000-Ada, H100, L40, L40S

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Users are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated and comply with applicable safety regulations and ethical standards.

For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

Get Help

Enterprise Support

Get access to knowledge base articles and support cases or submit a ticket.