Model Overview

Description:

The NVIDIA Synthetic Video Detection (SVD) model is designed to identify whether a given video is real or synthetic (AI-generated). It is optimized for detection of videos generated by diffusion models, and is intended for use in applications such as content & media creation, video authentication, digital forensics, and media integrity services. It is specifically designed to be robust to typical video compression. The output of the model is a predicted label indicating whether the input video is real or synthetic.
This model is ready for commercial/non-commercial use.

License/Terms of use:

GOVERNING TERMS: Use of this model is governed by the NVIDIA Software and Model Evaluation License and the DINOv3 License. ADDITIONAL INFORMATION: Apache 2.0.

Deployment Geography:

Global

Use case:

The NVIDIA SVD model is intended for general users intending to evaluate whether a video is synthetic or real. To use this model, the user may submit a video and download or observe the predicted label (i.e., whether a video is synthetic or real, on a scale of 0 through 1; with 0 indicating a real video) after the video has been evaluated by the model.

Release Date:

Build.Nvidia.com - 04/16/2026
NGC - 04/16/2026 via https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/containers/synthetic-video-detector

References(s):

Corvi, et al. “Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation”, 39th Conference on Neural Information Processing Systems (NeurIPS 2025).
This model is an improved version of the SAFE Challenge winner at ICCV 2025.

Model Architecture:

Architecture Type: Vision Transformer (ViT)

Network Architecture: Ensemble of DINOv2 and DINOv3 architecture

This model was developed based on an ensemble of DINOv2 and DINOv3 backbones.

Number of model parameters: 1.72 * 10^8

Input(s):

Input Type(s): Video

Input Format(s): Video: .mp4

Input Parameters: Video: Three-Dimensional (3D)

Other Properties Related to Input: The frames of videos are cropped to size 504x504 and normalized before passing as input to NVIDIA SVD model.

Output(s):

Output Type(s): Text (1D).

Output Format(s): Text: String

Output Parameters: Text: One-Dimensional (1D)

Other Properties Related to Output: The text output indicates a number per video frame (in the range [0,1]), where 0 indicates a real video and 1 indicates a synthetic video.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Supported Hardware Microarchitecture Compatability:

[NVIDIA Ampere]
[NVIDIA Blackwell]
[NVIDIA Turing]
[NVIDIA Lovelace]

The feature requires NVENC/NVDEC hardware. GPUs without NVENC/NVDEC hardware support are not supported, including A100, H100, and B100, B200 products. For details about supported GPUs and H.264 YUV formats, refer to the Video Encode and Decode GPU Support Matrix.

Preferred/Supported Operating System(s):

Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version(s):

v1.0.0

Training, Testing and Evaluation Dataset:

Training Dataset

Link: internally-created hybrid dataset data including:

Panda70M (only URLs)
Pyramid Flow
Open-Sora Plan
Allegro
cosmos-predict1

Video Training Data Size: Less than 10,000 Hours

Data Collection Method by dataset:

Hybrid: Automated, Synthetic

videos sourced from publicly available internet scale data mentioned in Panda70M dataset with timestamps
synthetically generated videos 4 text-to-video generators (Pyramid Flow, Open-Sora Plan, Allegro, cosmos-predict1)

Labeling Method by dataset:

Human

Properties (Quantity, Dataset Descriptions, Sensor(s)): Total 1400 real videos and 1400 synthetic videos per generator.

Testing Dataset

Link: internally-created hybrid dataset data including:

Panda70M (only URLs)
Pyramid Flow
Open-Sora Plan
Allegro
cosmos-predict1

Data Collection Method by dataset:

Hybrid: Automated, Synthetic

videos sourced from publicly available internet scale data mentioned in Panda70M dataset with timestamps
synthetically generated videos 4 text-to-video generators (Pyramid Flow, Open-Sora Plan, Allegro, cosmos-predict1)

Labeling Method by dataset:

Automated

Properties (Quantity, Dataset Descriptions, Sensor(s)): Total 300 real videos and 1400 synthetic videos per generator.

Evaluation Dataset:

Link: Internal hybrid dataset.

Benchmark Score: accuracy 85.64%

Data Collection Method by dataset: Human

Labeling Method by dataset: Human

Properties: 4000 videos in the evaluation benchmark.

Inference:

Engine: TensorRT
Test Hardware:

CUDA 12.8 compatible hardware versions of Desktop and Servers.

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards.

Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.