---
title: "3D Conditioning for Precise Visual Generative AI"
publisher: "nvidia"
type: "blueprint"
updated: "2026-02-13T18:42:38.466Z"
description: "Enhance and modify high-quality compositions using real-time rendering and generative AI output without affecting a hero product asset."
canonical: "https://build.nvidia.com/nvidia/conditioning-for-precise-visual-generative-ai"
---

## Use Case Description
The 3D conditioning for precise visual generative AI NVIDIA Omniverse Blueprint, powered by [NVIDIA NIM™](https://www.nvidia.com/en-us/ai/), [NVIDIA Omniverse](https://www.nvidia.com/en-us/omniverse/)™, [OpenUSD](https://www.nvidia.com/en-us/omniverse/usd/), image2image models such as SDXL or tuned models like Realviz4.0, and [Shutterstock Generative 3D](https://www.shutterstock.com/discover/generative-ai-3d), offers a streamlined solution for creating precise, on-brand images. This experience allows users to choose the color of a hero asset, select the desired camera angle in the 3D scene, and then use generative AI to customize scene components such as backgrounds and props. Using this experience as inspiration, developers can download and customize the blueprint to unlock use cases such as scalable concepting and ideation, through to the creation of marketing assets for their brands or customers.

## Experience Walkthrough
The user is presented with a live 3D viewport showcasing the final product—the "hero asset"—created by a creative team. In this instance, the hero asset is an espresso machine with a coffee mug. This asset, representing the final design, includes all the final materials and product options. It is placed within a rudimentary scene that appears unfinished. Additional props, such as the cutting board, were generated using Shutterstock 3D Generator to populate the counter with objects unavailable to the creative team when the initial scene was created.

The user can orbit the hero asset using the left mouse button and zoom in or out with the mouse wheel. Navigation within the scene is designed to keep the hero asset in frame at all times. Once the user has identified a suitable camera angle, they can adjust the espresso machine's configuration. The machine has two control surface options, various color options, and a choice of coffee mug style. An additional control allows the user to select a pre-generated HDRi image (created with Shutterstock 360 HDRi Generator) to quickly modify the scene's background.

Next, the user inputs individual prompts to generate the background. These prompts are linked to specific objects, ensuring each prompt modifies a designated area within the scene. After the user enters the prompts, the system processes them along with the scene's layout using generative AI to create the final image. During this process, the system generates masks for the targeted prompts, which the creative team can use for further image processing.

## Architecture Diagram
![Architecture Diagram](https://assets.ngc.nvidia.com/products/api-catalog/conditioning-for-precise-visual-generative-ai/diagram.jpg)

## Included NIM
The following [NIM](https://www.nvidia.com/en-us/ai/) are used by this blueprint:  
[USD Search](https://build.nvidia.com/nvidia/usdsearch)  
[USD Code](https://build.nvidia.com/nvidia/usdcode-llama3-70b-instruct)  

## What’s included in the Blueprint
[NVIDIA Blueprints](https://nvidianews.nvidia.com/news/nvidia-and-global-partners-launch-nim-agent-blueprints-for-enterprises-to-make-their-own-ai) are customizable AI workflow examples that equip enterprise developers with NIM microservices, reference code, documentation, and a helm chart for deployment.

This blueprint provides a [reference](https://resources.nvidia.com/en-us-omniverse-product-configurator/blueprint-3d-conditioning) and [workflow guide](https://github.com/NVIDIA-Omniverse-Blueprints/3d-conditioning/tree/main) for the users to showcase how diffusion models, control nets, and corresponding auxiliary tools can be easily integrated to Omniverse to be streamed remotely. Our primary container with Omniverse handles viewport streaming and message passing between the web front-end with the second container; users can opt to use ComfyUI \+ an image2image model such as SDXL or tuned models like Realviz4.0, leveraging our default template or take their own custom pipeline for diffusion models to handle requests coming from the first container with Omniverse. Then, we push the helm chart with the two containers.      

## Minimum System Requirements
Hardware Requirements

GPU: 2 x L40 deployed (One for rendering the scene and another for inferencing the diffusion model) or 1x NVIDIA RTX™ 6000 Ada Generation for local

CPU: x86\_64 architecture, 8 Cores (Intel Core i7 (7th Generation) or AMD Ryzen 5\)

System Memory: 64GB

Software Requirements

OS: Ubuntu 22.04

## Example Walkthrough with Sample Input/Output 
Primary Container with Omniverse

Input

Input Type(s): JSON with payloads of text prompts and dropdown options in text

Input Format: bytes

Output

Output Type(s): Viewport, Image

Output Format: stream

Second Container with Diffusion Model

Input

Input Type(s): JSON graph structure with embedded parameters (text, number, and image in base64) (10MB custom limit, which can be changed from the primary container)
Input Format: bytes

Output
Output Type(s): Image
Output Format: bytes

The framework is designed to enable software developers to rapidly prototype and productize custom workflows that involve capturing buffers from the viewport of a USD scene while taking conditions in a text form, then generating an image that accounts for both constraints by running an inference of diffusion models. We have included a docker image that automatically installs and deploys ComfyUI + image2image models such as SDXL or tuned models like Realviz4.0 for the included workflow, which leverages Control Net heavily to handle multiple conditions.

## Technical Considerations 
The software is capable of conditioning a USD scene to mask a target asset to keep the 3D rendering while inferring the others via a diffusion model given the normal map or depth map as additional constraints along with text prompts. This allows users to have more artistic control over 3D assets, as well as opting to use the 3D rendering instead of the image generated by the diffusion model. To run the experience locally, you can use the downloadable and choose to run web front end or Omniverse application. Please consult our [documentation](https://github.com/NVIDIA-Omniverse-Blueprints/3d-conditioning/tree/main) to learn more about how we integrate 3D rendering and the diffusion model to achieve the final result.  

## Ethical Considerations
NVIDIA believes that Trustworthy AI is a shared responsibility and we have established policies and practices to enable the development of a wide array of AI applications. When downloaded or used under our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and address unforeseen product misuse. 

For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). 

## Terms of Use
GOVERNING TERMS: This trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf). ADDITIONAL INFORMATION: RealvisXL license at [LICENSE.md · stabilityai/stable-diffusion-xl-base-1.0 at main.](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)

## Bias

| Field | Response |
| -- | -- |
|Participation considerations from adversely impacted groups [(protected classes)](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None of the Above |
| Measures taken to mitigate against unwanted bias: | Scaled Dataset and adding augmentations.|

## Explainability

| Field | Response |
| -- | -- |
| Intended Application(s) & Domain(s): | Generating image embedding that is aligned with text for zero-shot classification. |
| Model Type: | Embedding Generation |
| Intended Users: | This model is intended for developers building search engines, classification, detection/ segmentation models. |
| Output: | Embedding Features |
| Describe how the model works: | This model has a vision extractor and a text encoder trained for embedding alignment |
| Technical Limitations: | Model needs a downstream task specific head to perform CV tasks. |
| Verified to have met prescribed NVIDIA standards: | Yes |
| Performance Metrics: | ImageNet zero-shot accuracy  |
| Licensing: | GOVERNING TERMS: This trial is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf). The use of this model is governed by the [AI Foundation Models Community License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement/). |

## Privacy

| Field | Response |
| -- | -- |
| Generatable or reverse engineerable personally-identifiable information (PII)? | None |
| Protected classes used to create this model? | Not Applicable (No PII) |
| Was consent obtained for any personal data used? | Not Applicable (No personal data) |
| How often is dataset reviewed? | 	Before Release |
| Is a mechanism in place to honor data subject right of access or deletion of personal data? | No |
| If personal data collected for the development of the model, was it collected directly by NVIDIA? |Not Applicable |
| If personal data collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects?	| Not Applicable |
| If personal data collected for the development of this AI model, was it minimized to only what was required? | Not Applicable |
| Is there provenance for all datasets used in training? | Yes |
| Does data labeling (annotation, metadata) comply with privacy laws? | Yes |
| Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
| Applicable NVIDIA Privacy Policy	| [https://www.nvidia.com/en-us/about-nvidia/privacy-policy/](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) |

## Safety & Security

| Field | Response |
| -- | -- |
| Model Application(s): | Embedding generation and retrieval for OpenUSD assets |
| Describe the life-critical application (if present). | None: Not within Operational Design Domain |
| Use Case Restrictions: | Abide by [https://www.nvidia.com/en-us/data-center/products/nvidia-ai-enterprise/eula/](https://www.nvidia.com/en-us/data-center/products/nvidia-ai-enterprise/eula/)  |
| Describe access restrictions (if any): | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development.  |