
An image edit model specialized for Omniverse synthetic to photographic solder-light style captured at NVIDIA PCB inspection stations
Qwen-Image-Edit-NVPCB-OVSL2SL transforms synthetic solder-light printed-circuit-board (PCB) component crops — produced in NVIDIA Omniverse — into the photographic solder-light style captured at NVIDIA PCB inspection stations, so that downstream PCB inspection models trained on real solder-light photographs can be augmented with Omniverse-generated synthetic data. The release is an NVIDIA fine-tuned version of the Qwen-Image-Edit image-to-image diffusion pipeline (diffusion transformer, Qwen2.5-VL text encoder, Qwen-Image VAE, tokenizer, image processor, and scheduler configuration), specialized for the Omniverse → NVPCB solder-light style transfer. Qwen-Image-Edit-NVPCB-OVSL2SL v1.0.0 was developed by NVIDIA as part of the NVPCB inspection-data harmonization pipeline. This model is ready for commercial use.
Governing Terms: Use of this model is governed by the NVIDIA Open Model Agreement. Additional Information: Apache License, Version 2.0.
Global
NVIDIA engineers and researchers building PCB inspection / automated optical inspection (AOI) systems that need to be augmented with Omniverse-generated synthetic data. The model converts Omniverse-rendered solder-light PCB component crops into the photographic solder-light style produced by NVIDIA's physical inspection stations, closing the sim-to-real style gap so that inspection models trained on real photographs can be evaluated or augmented with synthetic Omniverse data. This model is not intended to be the primary inspection decision-maker; it is a sim-to-real data-translation step. Inspection pass/fail decisions must come from a downstream inspection model with human review.
Github 06/02/2026 via https://github.com/NVIDIA/paidf-augmentation
Architecture Type: Transformer (diffusion transformer with cross-modal conditioning)
Network Architecture: This release is a self-contained HuggingFace diffusers pipeline directory. All of the following components are redistributed as part of the release artifact:
transformer/ (NVIDIA fine-tuned, redistributed): the upstream Qwen-Image-Edit flow-matching image-to-image diffusion transformer, fine-tuned by NVIDIA on the attention and feed-forward projections of QwenImageTransformerBlock (to_q, to_k, to_v, to_out, the cross-attention add_{q,k,v}_proj / to_add_out, and img_mlp / txt_mlp).text_encoder/ (redistributed unmodified): Qwen2.5-VL; attends over both the input image and the instruction prompt.vae/ (redistributed unmodified): Qwen-Image VAE.tokenizer/, processor/, scheduler/, and model_index.json (redistributed unmodified): Qwen-Image-Edit tokenizer, image processor, scheduler configuration, and pipeline entry point.diffusers.QwenImageEditPipeline.from_pretrained(...).This model was developed based on Qwen-Image-Edit.
Number of model parameters: Approximately ~2.0 × 10^10 (20B) total parameters in the released checkpoint. Of these, ~1.7 × 10^8 (170M) parameters in the diffusion transformer were updated by NVIDIA fine-tuning; the remaining parameters come from the upstream Qwen-Image-Edit pipeline (transformer + Qwen2.5-VL text encoder + Qwen-Image VAE) and are redistributed unmodified.
Cumulative Compute: ~0.6 GPU-hour total on a single NVIDIA H100 SXM (~0.5 GPU-hour for the 1500-step fine-tuning run + ~5 GPU-minutes for the latent/embedding cache build).
Estimated Energy and Emissions for Model Training: ~0.4 kWh and ~0.16 kgCO2e total. Methodology: GPU energy = 0.6 GPU-hour × 0.7 kW (H100 SXM rated TDP) × 0.6 average utilization (typical for LoRA fine-tuning, which is not consistently GPU-bound) ≈ 0.25 kWh; multiplied by an assumed datacenter PUE of 1.5 to account for cooling and facility overhead ≈ 0.38 kWh; multiplied by 0.4 kgCO2e/kWh (U.S. national-grid average) ≈ 0.16 kgCO2e. Estimates use rated TDP rather than measured wall-power and are therefore conservative upper bounds; actual emissions depend on the specific datacenter's PUE and regional grid carbon intensity at training time.
Input Type(s): Image, Text
Input Format(s):
Input Parameters:
Other Properties Related to Input: Fine-tuned at target area 262,144 pixels (~512×512); other resolutions are accepted by the underlying diffusers pipeline but NVIDIA fine-tuning was not performed at them, so style fidelity may degrade. Input must be a single Omniverse-rendered PCB component crop on an approximately black background, similar to the synthetic solder-light style the model was fine-tuned on. The accompanying instruction prompt is the fixed English sentence used during training, recovered automatically from the cached training metadata (cache/_meta.pt) at inference time; the trained prompt is:
"Render this PCB component crop in the style of an NVPCB raked-solder-light photograph: dark reddish board with bright orange-red and blue specular highlights on the solder pads, photorealistic textures."
Output Type(s): Image
Output Format(s): PNG; Red, Green, Blue (RGB)
Output Parameters: Two-Dimensional (2D)
Other Properties Related to Output: Output resolution matches the input target area used during caching (~512×512). The output preserves component identity and board layout while transferring the rendering style from Omniverse-synthetic solder-light to NVPCB photographic solder-light.
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Runtime Engine(s):
diffusers QwenImageEditPipeline)Supported Hardware Microarchitecture Compatibility:
NVIDIA Ampere (A100)
NVIDIA Hopper (H100)
NVIDIA Lovelace (RTX 40-series)
Supported Operating System(s):
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
v1.0.0 — qwen-image-edit-nvpcb-OVSL2SL (NVIDIA fine-tune of the upstream Qwen-Image-Edit pipeline, 1500 training steps). The released artifact is a self-contained HuggingFace diffusers pipeline directory containing transformer/, text_encoder/, vae/, tokenizer/, processor/, scheduler/, and model_index.json.
** Image Training Data Size
** Text Training Data Size
** Data Collection Method by dataset
** Labeling Method by dataset
Properties: 228 paired component crops with Omniverse-rendered synthetic solder-light as input and NVPCB photographic solder-light as target. Modality: image + a fixed English instruction prompt. Content nature: synthetic renders and photographs of inanimate PCBs (NVIDIA-internal). No personal data, no copyright-protected web content, no machine-generated text/speech. No human subjects are depicted.
Data Collection Method by dataset:
Labeling Method by dataset:
Properties Not Applicable — the model is qualitatively evaluated on the held-out validation split (~5%) via side-by-side HTML reports rather than against a separate test split.
** Data Collection Method by dataset
** Labeling Method by dataset
** Properties: Held-out validation pairs (~5%) from the same paired set, used to produce qualitative side-by-side comparison HTML reports and to compute CLIP-style embedding distance between generated and ground-truth target images.
Acceleration Engine: PyTorch (native, bf16 weights) via diffusers.QwenImageEditPipeline.from_pretrained(...), loaded directly from the released pipeline directory.
Test Hardware:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment.
For more detailed information on ethical considerations for this model, please see the Bias, Explainability, Privacy, and Safety & Security tabs of ModelCard++.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.
Deploying and integrating the NIM is straightforward thanks to our industry standard APIs. Visit the Visual Generative AI NIM page for release documentation, deployment guides and more.
Get access to knowledge base articles and support cases or submit a ticket.